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INTRODUCTION  TO  SOUPAC 
A  USER'S  GUIDE 


A  SOUPAC  program  consists,  at  least,  of  (l)  IBM  360  system  statements, 
and  (2)  SOUPAC  statements.   The  IBM  360  system  statements  are  used  to  give 
instructions  to  the  360  computer,  and  the  SOUPAC  statements  are  used  to 
describe  the  types  of  data  manipulations  and  statistical  analyses  that 
are  to  be  performed.   The  set  of  SOUPAC  statements  in  one  job  is  called  a 
SOUPAC  parameter  deck,  and  it  will  include  references  to  one  or  more 
individual  SOUPAC  programs.   Note  all  cards  should  be  punched  on  an  IBM  029 
keypunch . 

Every  360  system  statement  must  be  on  a  separate  card  and  must  have  the 
characters  //  or  /*  in  columns  1  and  2.   In  particular,  the  system  cards 
for  a  SOUPAC  job  are  listed  here,  in  the  order  they  must  appear: 

CARD  1      /*ID      accounting  information 
CARD  2      //  EXEC  SOUPAC 
CARD  3      //SYS IN  DD  * 

SOUPAC  parameter  deck 
CARD  k  /* 

Note  that  these  four  cards  are  an  absolute  minimum. 

CARD  1 

The  /*ID  card  is  punched  on  a  yellow  striped  card  available  in  the 
keypunch  area.   This  card  has  no  clipped  corners  and  is  used  only  as  an 
ID  card  for  jobs  to  be  run  on  the  36O.   The  first  five  columns  of  this  card 
must  contain  the  characters  /*ID  followed  by  a  blank.   This  is  the  only 
card  in  your  deck  which  must  be  punched  on  a  special  card;  the  rest  of  your 
deck  may  be  punched  on  any  of  the  standard  corner  cut  cards  found  in  the 
keypunch  area. 

Sample  ID  cards : 

/*ID  PS=9999,DEPT=AGRON,NAME=SMITH 

/*ID  NAME= ' B0B_SMITH ' , PS=8899, DEPT=VOTEC, LINES=5000 

/*ID  C0DE=SWITCH, PS=9988,NAME=STEPPENWULF,DEPT=REC 

The  accounting  information  consists  of  the  following  keywords  and 
responses : 

KEYWORD  RESPONSE 


necessary 


PS=  your  problem  specification  number 

DEPT=  your  department 

NAME=  your  name 


KEYWORD 


RESPONSE 


optional 


(min,  sec) 
TIME=(,sec) 
min 


IOREQ=n 


CODE=ZZZZZ 


LINES=XXXXX 


CARDS =XXXX 


the  optional  TIME  parameter  on  the 
ID  card  is  the  estimate  of  execution 
time  for  the  job.   By  default  it  is 
1  minute.   The  time  estimate  on  the 
ID  card  is  the  time  estimate  for  the 
SOUPAC  job.   If  there  is  a  time 
estimate  on  the  //  EXEC  SOUPAC  card 
there  must  be  at  least  as  much  time 
on  the  ID  card. 

where  n  is  the  number  of  INPUT/OUTPUT 
requests.   The  default  is  1000. 
Moving  from  cards  to  a  temporary 
storage  location  generates  one  I/O 
request  per  observation.   Moving 
to  or  from  a  temporary  storage 
location  generates  two  I/O  requests 
per  observation.   Cards  read  in  or 
lines  printed  out  do  not  generate 
I/O  requests.   The  interactive  nature 
of  SOUPAC  programs  often  necessitates 
considerable  amounts  of  data  transfer. 
For  this  reason,  I/O  requests  for 
SOUPAC  jobs  are  relatively  large. 

code  is  the  signal  that  your  problem 
specification  number  is  code  word 
protected.   If  you  have  code  word 
protection,  get  the  code  word  from 
your  approved  problem  specification 
request  form.   If  you  do  not  have 
code  word  protection,  do  not  use  the 
keyword  CODE. 

you  requested  XXXXX  lines  of  output. 
The  default  is  2000. 

you  requested  XXXX  cards  punched. 
The  default  is  NONE. 


There  must  be  no  blank  spaces  in  the  accounting  information  and  the  order 
of  the  accounting  information  makes  no  difference.   Blanks  in  a  name,  such 
as  JOHN  USER,  are  not  allowed.   The  name  must  be  only  one  word  or  of  the 
form  JOHNUSER.   If  blanks  are  desired,  then  the  name  must  be  enclosed  in 
apostrophes  and  an  underscore  substituted  for  the  blank,  e.g.  'J0HN_USER'. 

Note  further  that  commas  separate  keywords  from  responses  to  keywords 
and  that  the  equal  sign  is  a  part  of  the  keyword  response.   Furthermore, 
your  ID  card  parameters  may  not  go  beyond  column  71 •   To  continue  an  ID  card 
put  a  comma  after  the  last  parameter  that  will  fit  completely  into  71  columns 


-3- 


or  less  and  proceed  on  a  second  card  which  has  the  characters  /*  punched  in 
columns  1  and  2,  followed  by  at  least  one  blank.   This  second  card  may  be 
any  standard  corner  cut  card. 

Further  information  about  additional  parameters  is  available  from  either 
the  Service  Programming  Area  or  the  SOUPAC  office. 

CARD  2 

The  //   EXEC  SOUPAC  is  punched  as  in  the  example  on  page  1  with  two 
slashes  in  columns  1  and  2?  followed  by  at  least  one  blank.   This  is 
followed  by  the  word  EXEC,  at  least  one  blank,  and  then  the  word  SOUPAC. 

There  is  in  fact  another  way  to  get  a  SOUPAC  job.   One  can  say: 

//   EXEC   SOUP 
//SYSIN  DD   * 

The  essential  differences  are  these:   SOUPAC  has  15  available  temporary 
storage  locations;  SOUP  has  only  5  temporary  storage  locations.   SOUP, 
however,  runs  about  five  seconds  faster  than  SOUPAC  and  is  generally 
recommended  for  users  who  do  not  require  more  temporary  storage  locations 
than  SI  through  S5. 

The  amount  of  memory  to  be  used  is  specified  on  each  //   EXEC  SOUPAC 
or  //   EXEC   SOUP  card  rather  than  on  the  ID  card.   SOUPAC  jobs  are  given 
a  default  region  of  150K  which  is  adequate  for  nearly  all  SOUPAC  jobs. 
See  sections  on  options  to  a  SOUPAC  job. 

CARD  3 

The  next  card  is  copied  as  in  the  example  with  //SYSIN  beginning  in 
column  1  and  extending  through  7  without  any  blanks.   This  is  followed  by  at 
least  one  blank,  the  two  characters  DD,  at  least  one  blank,  finally  the 
character  *. 

SOUPAC  Statements 


The  SOUPAC  parameter  deck  is  divided  into  two  main  sections.   The 
first  consists  of  the  problem  program  parameter  cards.   Individual  problem 
program  write-ups  are  available  to  users  through  the  SOUPAC  office.   The 
last  card  of  the  problem  program  parameter  cards  must  always  contain  the  words 
END  SOUPAC  as  the  first  non-blank  characters.   The  END  SOUPAC  card  is 
immediately  followed  by  any  data  decks  you  may  have.   All  data  decks  are 
preceded  by  a  DATA  format  card  and  followed  by  an  END#  card.   The  format 
card  has  the  word  DATA  as  the  first  four  non-blank  characters.   After  the  word 
DATA  may  be  any  comments  the  user  may  want,  if  any.   These  comments  will 
be  printed  on  the  user's  output,  but  will  be  otherwise  ignored  by  SOUPAC. 
Next  comes  the  data  card  parameters.   The  user  must  specify  the  number  of 
variables  per  row  of  data  to  be  input,  and  also  may  optionally  specify  the 
number  of  rows  of  data( observations ) .   Following  this  information  is  a 
standard  FORTRAN  type  format  enclosed  in  parentheses.   This  is  the  format 
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which  describes  the  structure  of  the  data  deck  it  precedes.   The  following 
forms  of  the  data  format  card  are  valid: 

DATA(observations  , variables  )  (format  string£_592  non-blank  characters) 

DATA (, variables ) (format  string£_592  non-blank  characters) 

DATA (variables ) (format  string£592  non-blank  characters) 

Remember  that  all  data  decks  must  be  terminated  by  END#  as  the  first 
four  non-blank  characters  on  a  single  card. 

Data  used  for  calculations  should  be  read  in  either  E  or  F  format. 
Data  not  used  for  calculations  can  be  read  in  any  format. 

END  CAEDS 

Within  a  SOUPAC  parameter  deck,  three  types  of  END  cards  are  used: 

END#:    This  card  is  used  to  end  each  data  deck  which  appears.   For  each 
DATA  format  card  used,  there  must  be  a  corresponding  END#  card 
at  the  end  of  the  data  deck. 

END  S:   This  card  appears  once  and  only  once  per  SOUPAC  parameter  deck 

and  must  appear  as  the  last  card  in  the  parameter  deck.   Its  function 
is  to  separate  the  SOUPAC  parameter  deck  from  the  data  decks. 

END  P:   This  card  is  used  to  indicate  the  end  of  any  program  which  requires 
subparameter  cards  and  must  appear  for  all  such  programs.   (in  this 
context  $-control  cards  are  not  taken  to  be  subparameter  cards). 
Any  program  which  needs  the  END  P  card  explicitly  states  so  within 
its  individual  program  write-up.   For  all  other  programs,  use  of  the 
END  P  card  is  an  error.   If  a  program  which  uses  the  END  P  card 
is  also  the  last  program  to  be  executed  it  must  be  terminated  by  an 
END  P  card  then  an  END  S  card  must  appear  to  indicate  that  no  more 
cards  follow. 

CARD  h 

The  last  physical  card  in  your  deck  must  contain  the  characters  /* 
in  columns  1  and  2.   No  other  punches  should  appear  on  this  card. 

Sample  SOUPAC  deck  setup: 

//ID  NAME=SMITH,DEPT=PEM,PS=6894,LINES=5000,CARDS=500,IOREQ=10000 
//   EXEC   SOUPAC 
//SYSIN  DD   * 

program  1  ()()()(  ) 

program  2  ()()()(  ) 

program  n  ()()()(  ) 

END  SOUPAC 
DATA 

END# 
/* 
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Note  that  there  may  he  more  prog]  ams  than  data  decks  or  more  data  decks 
than  programs.   In  any  case,  for  programs  that  read  data  decks,  the  user 
must  insure  that  the  order  of  data  decks  corresponds  to  the  order  of 
programs  that  read  data  decks  since  the  first  program  to  read  a  data  deck 
will  read  the  first  available  deck,  and  so  on.   The  process  continues  until 
either  the  SOUPAC  program  list  is  exhausted  or  a  program  reading  data  cannot 
find  a  data  deck  available.   Notice  that  extra  data  decks  are  ignored. 

A  FINAL  NOTE 

Any  changes  the  user  may  want  to  specify  in  the  job  control  language 
of  SOUPAC  must  follow  the  standard  OS/360  conventions. 
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OPTIONS  10  A  SOUPAC  JOB: 
PARMS,  PROLOG  CARDS,  AND  $-CONTROL  CARDS 


PARMS 


Parms  are  arguments  to  the  keyword  'Parameter,'  contracted  into 
the  keyword  'PARM' ,  which  give  instructions  to  a  processor  running  under  a 
360-system.   In  this  context,  SOUPAC  is  a  processor  running  under  a  360 
system.   PARMS  are  always  coded  on  an  EXEC  card  and  have  the  following  form: 

/ /  EXEC  SOUPAC , PARM= ' 0PT1 , 0PT2 , , OPTm ' 

The  permissable  options  to  he  used  as  SOUPAC  PARMS  are  listed  below  with  an 
explanation  of  their  use  and  function.   Note  that  the  default  is  underlined, 
that  is,  //   EXEC  SOUPAC  is  equivalent  to  //   EXEC  SOUPAC ,PARM='0PT1,0PT2. . . 
where  the  underlined  PARM  is  to  be  taken  as  one  of  the  list  of  options  in  the 
PARM  string  in  the  example.   These  PARMS  give  the  SOUPAC  system  instructions 
in  the  same  way  that  parameters  give  SOUPAC  statistical  or  data  management 
programs  instructions. 

1.  NODYNAM  or  DYNAM 

NODYNAM  implies  that  a  non-dynamic ally  allocatable  version  of  the 
library  of  statistical  procedures  is  to  be  used.   This  version  will 
run  in  some  150K  of  core  and  will  handle  a  lesser  number  of 
variables  than  the  dynamically  allocatable  version.   DYNAM  will  use 
the  dynamically  allocatable  version  of  any  program  requested  which 
will  handle  more  variables  in  an  arbitrarily  specified  amount  of 
core  above  a  certain  minimum.   If  using  DYNAM,  see  the  SOUPAC 
consultants  for  a  handout  on  optimal  region  sizes  for  particular 
numbers  of  variables. 

2.  EXECUTE  or  NOEXECUTE 

NOEXECUTE  implies  that  the  SOUPAC  parameter  deck,  for  which  the 
Syntax  Interpreter  is  to  scan  and  build  intermediate  parameters, 
should  not  be  executed.   NOEXECUTE  indicates  that  only  a  syntax 
check  is  to  be  performed.   If  EXECUTE  is  specified  and  no  errors 
are  found  by  the  Syntax  Interpreter,  the  job  step  will  proceed. 
If  EXECUTE  is  specified  and  errors  are  found  by  the  Syntax  Inter- 
preter, execution  of  the  step  may  continue  depending  upon 
whether  LET  or  NOLET  is  also  specified. 

3.  NOLET  or  LET 

If  an  error  is  found  by  the  Syntax  Interpreter  and  EXECUTE  has 
been  specified,  execution  will  proceed  only  if  LET  was  also 
specified.   In  this  case,  execution  will  proceed  only  through  the 
last  program  processed  which  was  completely  error  free.   If  NOLET 
was  specified  and  errors  are  found  by  the  Syntax  Interpreter, 
execution  will  not  be  permitted. 

h.      LIST  or  NOLIST 

LIST  indicates  that  all  program  cards  are  to  be  listed.   NOLIST 
indicates  that  only  the  prolog  section  of  the  SOUPAC  parameter 
deck  is  to  be  listed. 
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5-   PGM  or  NOPGM 

PGM  indicates  that  a  complete  SOUPAC  parameter  deck  and  data  decks 
follow.   NOPGM  indicates  that  only  the  prolog  section  and  data 
deck  follow,  and  that  the  intermediate  parameters  are  being  provided 
by  the  user  by  over-riding  the  cataloged  procedure.   This  implies 
that  the  user  has  previously  run  a  SOUPAC  job  and  has  saved  the 
two  necessary  data  sets  so  that  he  may  run  the  same  program  again. 
To  perform  this  saving  of  data  sets  correctly,  a  user  should  visit 
the  SOUPAC  office  first  to  ensure  it  is  done  correctly. 

If  any  error  is  found  by  the  Syntax  Interpreter  in  the  prolog  section, 
the  job  step  will  not  continue. 

If  the  job  step  which  generated  the  intermediate  parameter  data 
sets  found  syntax  errors,  execution  of  the  job  step  in  which 
NOPGM  is  specified  will  continue  (if  EXECUTE  is  specified)  through 
the  last  program  processed  which  was  completely  error  free  regard- 
less of  whether  LET  or  NOLET  was  specified  in  either  job  step. 

Examples : 

To  do  just  a  syntax  check: 

//   EXEC   SOUPAC ,PARM= ' NOEXECUTE ' 
To  execute  up  to  the  first  program  found  to  have  syntax  errors : 

/ /   EXEC   SOUPAC ,PARM= ' LET ' 

To  execute  up  to  the  first  program  found  to  have  syntax  errors  and 
use  the  dynamically  allocatable  library: 

/ /   EXEC   SOUPAC , PARM= ' LET , DYNAM ' . 

Note  that  the  PARMS  may  be  listed  in  any  order. 

B.   PROLOG  OF  A  SOUPAC  JOB 

Described  below  are  several  #  control  cards  which  may  appear  in  the 
prolog  of  a  SOUPAC  job.   Within  the  prolog  these  control  cards  may 
appear  in  any  order.   If  prolog  control  cards  are  used,  they  must  appear 
immediately  after  the  SYSIN  card.   The  Syntax  Interpreter  determines 
the  end  of  the  prolog  when  it  reads  a  card  which  is  not  one  of  these 
types.   All  types  have  parameters  and  must  be  terminated  by  a  period. 
Prolog  cards  may  not  have  continuation  cards ,  hence  all  parameter 
information  must  be  punched  within  80  columns.   There  is  no  limit 
to  the  number  of  prolog  cards  permitted  nor  is  there  any  restriction  on 
the  number  of  any  one  type.   If  conflicting  information  is  entered,  the 
information  entered  last  overrides  any  previous  definitions. 

1.   ^REPEAT  OPTION 

The  #REPEAT  OPTION  is  used  to  repeat  sections  of  a  SOUPAC 
parameter  deck  an  optional  number  of  times.   The  #REPEAT  card 
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which  appears  in  the  prolog  section  will  be  followed  by  up  to 
22  (twenty-two)  integer  parameters  which  will  indicate  the  number 
of  repetitions  of  up  to  22  repeat  sequences.   The  card  sequences  to 
be  repeated  will  be  preceded  and  followed  by  #SREP  and  #EREP 
cards  respectively.   Example: 

/*ID 

//   EXEC  SOUP 

//SYSIN  DD  * 

#REPEAT  (2). 

<additional  program  cards> 

#SREP 

CORRELATION  (c)(  )(Sl). 

SQUARE  ROOT  FACTOR  ANALYSIS  (Si) (P(F) ) (20) (C)(P(F) ) . 

#EREP 

END  S 

In  this  example  the  program  sequence  of  CORRELATION  and  SQUARE  ROOT 
FACTOR  ANALYSIS  will  be  repeated  twice.   Four  card  input  data 
sets  would  be  required  for  the  repeated  sections. 

Repeat  sequences  which  begin  before  a  main  program  and  end  in 
a  subprogram  or  which  begin  in  a  subprogram  and  do  not  end  in  the 
same  subprogram  are  not  allowed.   Nested  or  overlapping  repeat  sequences 
are  not  allowed.   Also  a  #SREP  card  cannot  be  immediately  followed 
by  a  #EREP  card  and  a  single  appearance  in  the  deck  of  either  card 
will  cause  an  error. 

2.   #V-UNIT  OPTION 

#V-UNIT  allows  the  user  to  change  input  and  output  addresses 
in  the  execution  of  one  SOUPAC  job.   The  form  of  a  #V  is  as  follows: 

#Vn  (m)  (A1) (AjJ. 

where  n  is  an  integer  1  through  9»  thus  there  can  be  at  most  9 
variable  addresses,  namely  VI  through  V9;  and  m  is  a  counter  which 
determines  how  many  times  a  variable  address  may  be  used  before  it 

assumes  the  next  value  in  its  list  of  possible  values.  Aj_ A^ 

are  addresses  which  Vn  assumes.   These  can  be  any  valid  address. 

At  the  moment,  however,  forms  like  (Sl/P)  will  not  work.   Note  that  . 

CARDS  and  PRINT  are  permitted. 

Finally,  the  list  of  addresses  is  cyclic;  that  is,  if,  after 
A^  has  been  used,  Vn  occurs  again  in  the  program,  Vn  will  have  the 
value  A]_ ,  and  so  on . 

/•ID 

//   EXEC  SOUP 

//SYSIN  DD   * 

#V9(D(S1)(S2)(S3)(SU). 

#V5(1)(S1)(S2)(S3)(SU). 

#REPEAT  (k). 

MAT. 
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( Example,  Continued) 

#SREP 

MOV  (C)(V9). 

#EREP 

HOR  (V5)(V5)(V5)(V5)(S5' 

END  P 


END  S 

This  program  segment  reads  k   separate  card  decks,  saving  them  in 
temporary  storage,  and  horizontally  augments  them  into  one  data  set. 

The  equivalent  without  the  use  of  #REPEAT  and  #V  would  be  as  follows 

/•ID 

//   EXEC   SOUP 
//SYSIN  DD   *  ' 
MAT. 

M0V(C)(S1)  • 
M0V(C)(S2)  • 
M0V(C)(S3)  • 
MOV(C)(SU)  • 

H0H(B1)(S"2)(S3)(SU)(S5). 

END  P 


END  S 

Note  that  the  M0V(C)(V9).  statement  is  expanded  into  four  move  statements 
and  V9  takes  the  values  SI  through  SU .   Similarly,  V5  takes  on  the 
values  of  SI  through  Sk. 

3-  #OLD  OPTION 

The  #0LD  option  is  used  to  define  the  number  of  rows  in  a 
sequential  data  set  created  by  a  previously  run  SOUPAC  job.   The 
number  of  rows  is  then  entered  into  a  table  in  the  monitor.   This 
option  should  be  used  whenever  the  header  record  on  the  data  set 
is  not  known  to  have  a  correct  value  for  the  number  of  rows,  and  the 
user  does  not  want  to  execute  a  MATRIX  MOVE  to  count  the  rows.   To 
use  the  option,  punch  a  card  with  #OLD  in  the  first  four  columns. 
Then  code  the  address  and  the  number  of  rows  in  the  usual  SOUPAC 
fashion.   The  number  of  columns  may  be  coded  on  the  card  if  desired, 
but  will  be  totally  ignored.   Include  this  card  in  the  prolog  section 
of  the  SOUPAC  job. 

For  example,  to  indicate  that  a  data  set  to  be  input  from 
SEQUENTIAL  1  has  77  rows  you  would  prepare  the  following  card: 

#OLD  (Sl)(77)- 
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k.       #TEST  OPTION 

There  is  also  available  a  #TEST  option;  however,  this  facility- 
is  complicated  and  intended  for  testing  purposes  within  the  SOUPAC 
office  and  has  no  significant  advantage  for  the  general  user. 

5.   #DEFINE  OPTION 

Whenever  the  user  wishes  to  specify  the  dimensions  of  a  direct  access 
data  set  (DISK  address),  punch  #DEFINE  in  the  first  seven  columns  of 
a  card  followed  by  the  address,  number  of  rows  and  number  of  columns 
coded  in  the  usual  SOUPAC  fashion.   Include  this  card  in  the  prolog 
section  of  your  program.   For  double  precision  matrices,  code  the  same 
number  of  rows,  but  twice  as  many  columns  as  otherwise.   DISK  1  and  DISK  2 
have  default  definitions  of  1+50  rows  by  k^O   columns  single  precision. 
If  the  user  desires  any  other  dimensions  on  these  data  sets ,  #DEFINE  must 
be  used.   If  the  user  desires  to  use  any  DISK  address  other  than  DISK  1 
and  DISK  2,  #DEFINE  must  be  used  besides  supplying  the  necessary  DD 
cards . 

For  example,  to  define  a  data  set  for  DISK  17  with  20  rows  and  Ho 
columns  double  precision,  you  would  prepare  the  following  card: 

#DEFINE  (DISK  17)(20)(8o). 

Notice  that  all  prolog  cards  start  with  a  #  in  column  one  and  must  occur 
before  any  SOUPAC  program  parameter  cards.   A  #-card  in  the  middle  of 
the  SOUPAC  program  parameter  deck  is  treated  as  a  comment.   There  is, 
however,  a  #-control  card,  while  not  strictly  a  prolog  card,  which  may 
occur  in  the  SOUPAC  program  parameter  deck  and  will  not  be  treated  as 
a  comment.   This  is  the  #-zero  card  and  is  the  only  exception  to  the 
statement  about  #  cards  being  comments  if  in  the  middle  of  the  deck. 
The  #-zero  card  is  essentially  a  debugging  tool  to  facilitate  reading 
of  dumps  if  one  is  needed.   It  has  no  particular  use  for  the  user. 


C.   $- CONTROL  CARDS 

$-C0NTR0L  CARDS  are  used  to  provide  additional  information 
to  a  SOUPAC  program  above  and  beyond  what  is  included  in  the  parameters. 
There  are  3  $-control  cards.   All  must  begin  in  column  one  with  the  character 
$  and  then  continue  accross  the  card  without  blank  columns. 

1.   $C-B 

The  $C-B  card  provides  as  its  arguments  the  variables  to 
be  used  as  control  breaks  for  a  program  which  accept  control 
breaks.   The  use  of  this  card  with  a  program  which  does 
not  accept  control  breaks  is  an  error.   The  form  of  this  card 
is  as  follows: 

$C-B(V1)(V2) (Vn). 

When  V-j_  through  Vn  are  variable  numbers  and  n  must  be  less  than  or 
equal  to  2k. 
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2.   $INP 

$INP  has  as  its  arguments  a  string  of  input  addresses. 
The  form  is: 

$INP(A1) (A  ). 

where  A-[_  through  An  are  input  addresses  including  cards. 
The  number  of  addresses  will  be  determined  by  the  program 
accepting  the  $INP  card  and  will  explicitly  mentioned  in  the 
program  write-up. 

3 •   $OUT 

$OUT(An  ) (A  ). 

I     1  n 

$OUT  has  as  its  arguments  a  string  of  output  addresses.   The 

form  is  the  same  as  that  for  $INP  and  the  number  of  addresses 

is  also  determined  by  the  program  accepting  the  $OUT  card. 

Multiple  output  address  will  be  accepted.   See  section  on 

Input/Output  multiple  addresses. 
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SOUPAC  INPUT-OUTFJT  AND  TEMPORARY  STORAGE 

I.  GENERAL 

A.  Input  and  Output  as  Data  Types 

Consider  a  set  of  data  which  a  researcher  wants  intercorrelated. 
To  do  correlations  there  is  in  the  SOUPAC  library  of  statistical  precedures 
a  correlation  program.   Input  to  the  correlation  program  is  the  researcher's 
raw  data;  output  from  the  correlation  program  is  a  matrix  of  correlation 
coefficients.  Similarly,  every  conceivable  program  has  a  particular  input; 
in  fact,  perhaps  several  inputs,  and  some  output. 

The  nature  of  the  input  and  output  of  a  particular  program  will 
depend  on  the  program  and  its  intent.   For  example,  raw  data  variables 
are  input  into  a  correlation  program  which  outputs  a  correlation  matrix. 
But  a  factor  analysis  program  expects  as  input  a  correlation  matrix, 
and  yields  as  output  a  factor  matrix.   In  contrast  to  the  singular 
relation  of  the  nature  of  input  and  output  to  a  particular  statistical 
program,  every  program  finds  its  input  somewhere  and  must  put  its  output 
somewhere . 

B.  Input  and  Output  as  Data  Sources 

SOUPAC  is  designed  in  such  a  manner  that  the  researcher  can  tell 
any  program  where   his  inputs  are  and  where  to  put  his  outputs.   Punched  cards 
are  an  obvious  input  source ;  printed  pages  are  an  obvious  output  source . 
But  the  nature  of  a  punched  card  deck  input  into  a  correlation  program  would 
be  that  of  raw  data  variables.   In  the  SOUPAC  system  input  and  output  sources 
are  also  called  addresses.   Thus,  a  possible  input  address  for  a  correlation 
program  is  cards  and  a  possible  output  address  for  correlation  coefficients 
is  print.   Input  and  output  addresses  are  parameters  to  every  program  in 
the  SOUPAC  system.   As  the  researcher  reads  a  particular  program  write-up 
he  will  notice  that  the  order  of  the  parameters  determines  the  nature  of 
his  input  or  output  and  his  supplying  an  input  or  output  address  determines 
whether  or  not  he  uses  or  gets  the  particular  inputs  and  outputs. 

II.  ELEMENTARY  INPUT/OUTPUT  ADDRESS  AND  TEMPORARY  STORAGE 

A.   Possible  elementary  input  and  output  addresses  in  the  SOUPAC 
system  are  these: 

INPUT:   CARDS,  SEQUENTIAL  1,  SEQUENTIAL  2,  .  .  . 
.  .  .  SEQUENTIAL  15 


OUTPUT:   PRINT,  SEQUENTIAL  1,  SEQUENTIAL  2 
.  .SEQUENTIAL  15  (See  section  on  i 


punched  cards ) 


Again,  CARDS  and  PRINT  are  obvious  sources.   SEQUENTIAL  1  through 
SEQUENTIAL  15,  however,  are  input  or  output  names  of  15  temporary  storage 
regions  available  to  the  researcher  in  the  SOUPAC  system.   These  15 
temporary  storage  regions  are  provided  for  exactly  that  purpose,  temporary 
storage  of  data.   Notice  that  with  this  facility  a  user  can  save  his 
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correlation  matrix,  for  example,  at  SEQUENTIAL  1  and  then  give  SEQUENTIAL 
1  as  an  input  address  to  a  factor  analysis  program.   Or  a  researcher  can 
construct  a  copy  of  his  data  on  temporary  storage  and  then  let  any  number 
of  programs  use  the  same  data  as  input  from  the  same  input  address,  saving 
him  the  effort  of  making  multiple  copies  of  his  card  deck  so  that  each 
program  would  read  its  own  deck.   Finally,  temporary  storage  addresses 
enable  the  saving  of  intermediate  results  for  further  processing  or 
modification  by  other  programs  and  thereby  enable  the  researcher  to 
construct  his  own  analysis  procedure  by  providing  the  appropriate  inputs 
and  outputs  to  the  right  programs  at  the  right  times. 

B.   SOUP  vs  SOUPAC  with  Respect  to  Temporary  Storage 

There  are  two  ways  of  invoking  the  SOUPAC  system.   One  can  ask 
for  SOUPAC  or  SOUP.   Note  that  all  15  temporary  storage  regions  are  allocated 
to  SOUPAC,  while  only  SEQUENTIAL  1  through  SEQUENTIAL  5  are  allocated  to 
SOUP.   Asking  for  SEQUENTIAL  6  through  SEQUENTIAL  15  vhen  running  under 
SOUP  will  cause  an  error  and  terminate  the  job. 

All  of  these  input-output  addresses  may  be  abbreviated  as  follows : 

CARDS  C 

PRINT  P 

SEQUENTIAL  1       SI  (or  Tl) 


SEQUENTIAL  15      SI 5  (or  T15) 

Tl  through  T15  are  alternative  abbreviations  for  SEQUENTIAL  1  through 
SEQUENTIAL  15-   Tl  through  T15  are,  in  fact,  abbreviations  of  TAPE  1 
through  TAPE  15-   SEQUENTIAL  1  through  SEQUENTIAL  15  and  their  abbreviations 
are  the  recommended  uses.   The  Tl  through  T15  notation  reflects  a  real 
technical  distinction  but  has  been  kept  to  enable  programs  using  that  notation 
to  run. 

C.   Multiple  Output  Addresses 

A  researcher  may  want  to  output  to  several  sources :   he  may  desire 
to  both  print  and  save  some  results  for  later  use.   He  cannot,  however, 
input  from  more  than  one  source  for  a  particular  input  address.   The  facility 
of  multiple  output  addresses  has  the  following  construction: 

(output  addressl/output  address^/output  address3). 

This  is  the  completely  general  form  providing  for  up  to  three  separate 
outputs.   Each  output  must  be  a  different  source,  however.   Thus,  (Sl/P/X) 
is  a  valid  multiple  output  address  providing  for  temporary  storage  at  SI, 
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a  print  of  the  same  data,  and  a  j unched  copy  of  the  data.   (See  section  on 
punch  for  explanation  of  X).   (Sl/P)  will  print  and  store  but  not  punch. 
The  order  of  the  addresses  makes  no  difference.   (Sl/P)  is  equivalent  to 
(P/Sl).   Forms  such  as  (S1/S2),  however,  are  not  permitted,  nor  are  (P/P) 
or  (X/X):  one  can  output  only  to  one  sequential  and  only  once  to  P  or  X. 

The  above  general  form  is  available  only  if  the  output  address  in 
the  particular  program  is  marked  with  an  Q. 

In  all  cases,  however,  the  form 

(output  address  l/output  address  2) 

is  valid  unless  the  program  write-up  explicitly  has  a  restriction. 

D.   Print  is  F  Form  of  Output  Print  Address. 

There  is  yet  another  form  to  output  addresses.   This  form  is 
available  only  where  the  researcher  finds  the  symbol  Q   in  the  program  write- 
up  and  has  to  do  with  the  kind  of  printed  output.   For  technical  reasons, 
most  programs  print  in  a  form  called  E-format  which  is  a  form  of 
scientific  notation.   This  form  allows  the  computer  to  print  numbers  of 
any  size.   Some  programs,  for  which  the  output  numbers  are  known  to  be 
constrained,  as  in  correlation  coefficients,  however,  print  in  a  form 
called  F- format  which  is  ordinary  decimal  number  representation.   F-format 
generally  cannot  print  numbers  larger  than  a  pre-determined  size.   The 
size  of  number  depends  on  the  nature  of  a  researcher's  data,  but  the  program 
has  no  way  of  knowing  this,  hence,  the  most  general  form,  E-format  is  used. 

The  researcher  however,  can  on  option  specify  F-format.   To  print  in 
F-format  he  would  use  the  following  output  address: 

(P(F))  or  (P(F)/S1)  if  he  wanted  a  multiple 

output  address.  Those  programs  which  print  in  F-format  already,  as  for 
correlation  coefficients,  can  be  made  to  print  in  E-format  by  using  the 
following  output  address: 

(P(E))  or  (P(E)/S15). 

The   different   forms   look  like  this: 

E-format  Scientific   Notation  F-format  Decimal  Number 

Representation 

+  0.1231+5E  06    +1.23^5  x  105  +  123^56. 123^5    +123U56.123U5 

All  four  numbers  have  the  same  value  correct  to  5  places.   Notice  that  F-format 
cannot  represent  a  number  greater  than  999999-99999  in  absolute  value 
whereas  E-format  can  represent  the  first  5  digits  of  any  number  of  order  of 
magnitude  up  to  1099  .   The  numbers  of  digits  illustrated  for  E  and  F 
formats  are  the  pre-determined  limits  for  the  size  of  numbers.   E-format  is 
the  more  general  form  but  F-format  is  easier  to  read. 
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correlation  matrix,  for  example,  at  SEQUENTIAL  1  and  then  give  SEQUENTIAL 
1  as  an  input  address  to  a  factor  analysis  program.   Or  a  researcher  can 
construct  a  copy  of  his  data  on  temporary  storage  and  then  let  any  number 
of  programs  use  the  same  data  as  input  from  the  same  input  address,  saving 
him  the  effort  of  making  multiple  copies  of  his  card  deck  so  that  each 
program  would  read  its  own  deck.   Finally,  temporary  storage  addresses 
enable  the  saving  of  intermediate  results  for  further  processing  or 
modification  by  other  programs  and  thereby  enable  the  researcher  to 
construct  his  own  analysis  procedure  by  providing  the  appropriate  inputs 
and  outputs  to  the  right  programs  at  the  right  times. 

B.   SOUP  vs  SOUPAC  with  Respect  to  Temporary  Storage 

There  are  two  ways  of  invoking  the  SOUPAC  system.   One  can  ask 
for  SOUPAC  or  SOUP.   Note  that  all  15  temporary  storage  regions  are  allocated 
to  SOUPAC,  while  only  SEQUENTIAL  1  through  SEQUENTIAL  5  are  allocated  to 
SOUP.   Asking  for  SEQUENTIAL  6  through  SEQUENTIAL  15  when  running  under 
SOUP  will  cause  an  error  and  terminate  the  job. 

All  of  these  input-output  addresses  may  be  abbreviated  as  follows: 


CARDS 

C 

PRINT 

P 

SEQUENTIAL  1 

SI  (or  Tl 

SEQUENTIAL  15       S15  (or  T15 ) 

Tl  through  T15  are  alternative  abbreviations  for  SEQUENTIAL  1  through 
SEQUENTIAL  15-   Tl  through  T15  are,  in  fact,  abbreviations  of  TAPE  1 
through  TAPE  15.   SEQUENTIAL  1  through  SEQUENTIAL  15  and  their  abbreviations 
are  the  recommended  uses.   The  Tl  through  T15  notation  reflects  a  real 
technical  distinction  but  has  been  kept  to  enable  programs  using  that  notation 
to  run. 

C.   Multiple  Output  Addresses 

A  researcher  may  want  to  output  to  several  sources:   he  may  desire 
to  both  print  and  save  some  results  for  later  use.   He  cannot,  however, 
input  from  more  than  one  source  for  a  particular  input  address.   The  facility 
of  multiple  output  addresses  has  the  following  construction: 

(output  addressl/output  address^/output  address^). 

This  is  the  completely  general  form  providing  for  up  to  three  separate 
outputs.   Each  output  must  be  a  different  source,  however.   Thus,  (Sl/P/X) 
is  a  valid  multiple  output  address  providing  for  temporary  storage  at  SI, 
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a  print  of  the  same  data,  and  a  j unched  copy  of  the  data.   (See  section  on 
punch  for  explanation  of  X).   (Sl/P)  will  print  and  store  but  not  punch. 
The  order  of  the  addresses  makes  no  difference.   (Sl/P)  is  equivalent  to 
(P/Sl).   Forms  such  as  (S1/S2),  however,  are  not  permitted,  nor  are  (P/P) 
or  (X/X):  one  can  output  only  to  one  sequential  and  only  once  to  P  or  X. 

The  above  general  form  is  available  only  if  the  output  address  in 
the  particular  program  is  marked  with  an  tt. 

In  all  cases,  however,  the  form 

(output  address  l/output  address  2) 

is  valid  unless  the  program  write-up  explicitly  has  a  restriction. 

D.   Print  is  F  Form  of  Output  Print  Address. 

There  is  yet  another  form  to  output  addresses.   This  form  is 
available  only  where  the  researcher  finds  the  symbol  fl  in  the  program  write- 
up  and  has  to  do  with  the  kind  of  printed  output.   For  technical  reasons, 
most  programs  print  in  a  form  called  E-format  which  is  a  form  of 
scientific  notation.   This  form  allows  the  computer  to  print  numbers  of 
any  size.   Some  programs,  for  which  the  output  numbers  are  known  to  be 
constrained,  as  in  correlation  coefficients,  however,  print  in  a  form 
called  F- format  which  is  ordinary  decimal  number  representation.   F-format 
generally  cannot  print  numbers  larger  than  a  pre-determined  size.   The 
size  of  number  depends  on  the  nature  of  a  researcher's  data,  but  the  program 
has  no  way  of  knowing  this,  hence,  the  most  general  form,  E-format  is  used. 

The  researcher  however,  can  on  option  specify  F-format.   To  print  in 
F-format  he  would  use  the  following  output  address: 

(P(F))  or  (P(F)/S1)  if  he  wanted  a  multiple 

output  address.  Those  programs  which  print  in  F-format  already,  as  for 
correlation  coefficients,  can  be  made  to  print  in  E-format  by  using  the 
following  output  address: 

(P(E))  or  (P(E)/S15). 

The   different   forms   look  like  this: 

E-format  Scientific   Notation  F-format  Decimal  Number 

Representation 

+  0.123i+5E  06         +1.23^5x105  +  123^56. 123^5         +123^56. 123^5 

All  four  numbers  have  the  same  value  correct  to  5  places.   Notice  that  F-format 
cannot  represent  a  number  greater  than  999999-99999  in  absolute  value 
whereas  E-format  can  represent  the  first  5  digits  of  any  number  of  order  of 
magnitude  up  to  1099  .   The  numbers  of  digits  illustrated  for  E  and  F 
formats  are  the  pre-determined  limits  for  the  size  of  numbers.   E-format  is 
the  more  general  form  but  F-format  is  easier  to  read. 
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In  this  example  of  E-fomiat  the  E  02  part  is  to  be  understood  as 
10  .   E  03  would  be  103  and  E-OU  would  be  10"^.   Thus,  .376  E  03  is 
.376  x  103  or  376.  while  .  129^-01  is  .129U  x  10"1  or  .01291+.   The  sign 
following  the  E  determines  which  way  to  move  the  decimal  point;  left  for 
negative,  right  for  blank  or  positive.   The  number  following  the  sign  or 
blank  determines  how  many  places  to  move  the  decimal  point. 

E.  Punched  Output   (Don't  forget  to  specify  CARDS=  on  ID  Card!) 

All  programs  which  have  output  addresses  marked  with  the  symbol 
Q  can  punch  output  directly  by  using  the  X  output  address.   X  is  the 
abbreviation  for  cards  as  output.   C  used  as  an  abbreviation  for  an  output 
address  will  be  an  error.   Punched  output  generated  by  the  use  of  the 
X  output  address  will  be  in  E- format.   (See  section  above).   X(F)  is  not 
a  valid  form  and  will  be  an  error. 

If  punched  output  is  desired  in  a  form  other  than  E-format  or 
from  a  program  which  does  not  allow  the  X  output  address,  then  the 
researcher  must  make  a  copy  of  his  data  on  temporary  storage  and  go  to 
the  MATRIX  program  and  use  the  PUNCH  instruction  provided  in  that 
program. 

F.  Obtaining  Additional  Input/Output  Sources 

It  happens  that  15  temporary  storage  locations  may  not  be 
enough.   Additional  temporary  storage  may  be  obtained  by  calling 
for  Sl6  through  S40  .   Use  of  Sl6  through  S*t0  requires  the  addition  of  Job 
Control  Cards  to  the  360  system  cards  of  the  SOUP AC  program  deck.   At 
least  the  first  time  the  researcher  should  check  with  S0UPAC  consultants 
before  doing  this;  firstly  to  learn  to  do  it  correctly  if  he  doesn't  know  how 
already,  and  secondly,  if  he  knows  how,  to  make  sure  none  of  the  Job  Control 
Language  has  been  changed  or  modified,  which  can  happen  due  to  360  system 
changes  or  reconfigurations,  or  S0UPAC  system  changes,  which  may  not  be 
announced  in  contrast  to  SOUPAC  program  changes  which  would  have  been 
announced. 

If  in  special  instances  even  Uo  temporary  storage  regions  are  not 
sufficient  or  a  situation  arises  where  so-called  DISK  temporary  storage  is 
required,  there  can  be  made  available  temporary  storage  regions  called 
DISK1  through  DISK^O.   Check  with  the  SOUPAC  consultants  before  using 
these  for  the  proper  Job  Control  Cards  and  the  proper  SOUPAC  prolog  cards. 

G.  Using  Owner  Data  Sources  or  Special  Input/Output  Requirements 
in  the  SOUPAC  system 

Users'  own  tapes  or  disk  packs  can  be  used  with  the  SOUPAC 
system  for  input  or  output.   See  Appendix  A  for  additional  information. 
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Special  input/output  requirements  can  usually  be  handled  in  the 
SOUPAC  system  provided  the  requirements  can  be  handled  by  the  360  system 
at  all.   In  such  cases  check  with  the  SOUPAC  consultants. 

General  problem  types  of  the  nature  alluded  to  above  would  be 
multiple  file  volumes,  blocked  input/output,  formatted  or  unformatted 
input/output,  different  kinds  of  record  lengths  and  different  forms  of  data 
representation  due  to  machine  differences  or  differences  in  facilities 
at  other  computer  installations. 


February,  1971 

Using  Owner  Storage  Devices  With  SOUPAC 
Introduction 

User  data  storage  mediums  for  our  360  user  will  typically  be  dismountable 
tape  or  disk  packs  or  they  will  be  user  data  sets  on  some  dismountable  or 
permanently  mounted  disk  pack.  Whatever  the  source,  it  must  be  described  fully 
so  that  the  machine  can  access  it.   This  information  is  needed  as  arguments 
to  certain  JCL  keywords.   These  parameters  will  be  treated  in  a  moment.   First 
we  want  to  consider  the  dismountable  devices,  tapes  and  disk  packs.   User  disk 
packs  are  not  as  frequent  as  user  tapes  so  we  shall  ignore  them  for  the  moment. 
There  are  several  "physical"  characteristics  of  tapes  which  pertain  to  recording 
techniques  which  must  be  considered  before  we  get  inot  the  "logical"  attributes 
described  by  the  JCL  parameters  in  the  next  section. 

Physical  tape  parameters  are  density,  number  of  tracks  written  and  recording 
mode.   Density  can  be  200,  556,  800,  or  1600  bits  per  inch,  abbreviated  BPI.   Thus 
a  tape  written  at  800  BPI  can  store  less  information  per  inch  than  one  written  at 
1600  BPI.   Note  that  a  tape  to  be  used  on  the  360  must  be  written  at  1600  BPI.   It 
is  true  that  we  have  a  dual  density  tape  drive,  namely  800/1600  BPI  accessible  to 
the  360  and  that  there  is  a  density  parameter  in  JCL  enabling  one  to  specify  the 
density  but  since  the  1600  mode  is  so  much  faster,  generally,  to  be  used  on  the  360, 
the  tape  should  be  at  1600  BPI. 

The  number  of  tracks  used  in  recording  is  7  or  9-   The  360  usually  uses  a  9 
track  recording  mode;  the  709^  used  the  7  track  mode.   Recording  mode  is  unformatted 
or  formatted  but  we  usually  speak  of  binary  tapes  or  card  image  tapes,  respectively. 
If  it  is  a  card  image  tape  it  is  BCD  or  EBCDIC. 

Thus,  the  most  desirable  input  tape  for  the  360  is  1600  BPI,  9  track,  360  binary 
or  EBCDIC.   If  one  or  more  of  these  characteristics  is  different,  then  it  should  be 
changed  by  doing  a  tape  to  tape  copy  with  the  appropriate  conversion.   Asking  for 
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special  exceptions  is  not  advised;  i.e.  do  not  try  getting  permission  to  write 

800  BPI  except  in  very  special  circumstances. 

At  this  installation  we  can  do  the  following  conversions: 

556  BPI  to  800  BPI  to  1600  BPI  and  vice  versa 
7  track  to  9  track  and  vice  versa 
BCD  to  EBCDIC  and  vice  versa 

Note  that  in  going  from  EBCDIC  to  BCD  some  characters  may  be  lost.   These  conver- 
sions are  at  the  moment  done  by  the  360/50  by  requesting  a  tape  to  tape  copy  for 
which  the  above  operations  are  requested. 

We  can  also  do  3o0  binary  to  EBCDIC  conversion  and  vice  versa  but  we  do  not 
have  the  facility  to  do  anything  about  a  709*+  binary  tape.   Not  that  it  cannot  be 
done,  but  of  the  two  alternative  means  we  know  of,  one  uses  IBM  conversion  routines 
which  would  require  hardware  changes;  the  other  requires  a  non-trivial  program 
which  we  do  not  have  and  even  when  we  get  it,  it  will  be  slow  and  clumsy. 

We  repeat  that  there  are  JCL  keywords  to  specify  density,  number  of  tracks, 
and  parity  but  it  is  too  slow  and  clumsy  for  the  360/75 • 

Clearly  the  greater  the  density,  the  more  information  can  be  stored  on  a  tape 
and  the  faster  it  can  be  read  or  written.   The  higher  density  does  however  also 
require  a  better  tape.   One  should  not  write  1600  BPI  on  a  tape  tested  at  800  BPI. 
Further  only  a  1600  BPI  tested  tape  can  be  reliably  written  with  9  tracks.   This 
test  BPI  information  is  usually  on  the  tape  reel  in  the  form  of  a  comment  saying 
tested  at  1600  BPI  or  800  BPI.   Note  that  1600  BPI  and  3200  FCI  mean  the  same  thing. 

There  are  several  things  to  keep  in  mind  when  writing  a  user  tape  in  binary 
or  card  image  mode.   Card  image,  line  image,  and  formatted  tape  all  mean  essentially 
the  same  and  mean  exactly  what  their  names  imply.   Clearly  all  these  are  read  and 
written  with  a  format.   Binary  and  unformatted  tapes  read  and  write  faster  than 
formatted  tapes  but  may  take  up  more  space  than  formatted  tapes.   How  much  data 
a  user  wants  always  accessible  and  how  important  time  is  will  probably  determine 
what  mode  to  use.   A  card  image  tape  may  be  read  many  different  ways  using  formats 
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to  pick  off  variables.   An  unformatted  tape  has  variables  immediately  usable  but 
perhaps  not  everything  the  user  has  on  his  cards.  He  can  put  everything  he  may 
ever  want  to  use  on  his  unformatted  tape  but  he  may  also  be  dragging  some  super- 
fluous variables  around  everytime  he  does  a  read. 

Choice,  though  possible,  is  typically  not  available  for  number  of  tracks  on 
tapes  written  by  the  360.   You  get  9  track  tapes. 

So,  the  usual  360  output  tape  is  1600  BPI,  9  track,  formatted  or  unformatted. 
SOUPAC  writes  1600  BPI,  9  track,  unformatted  tapes.   To  get  a  1600  BPI,  9  track, 
formatted  tape  out  of  SOUPAC  the  easiest  thing  to  do  at  the  moment  is  to  over-ride 
the  punch  statement  in  the  proc,  i.e.,  FT07F001.   To  input  a  formatted  tape  to 
SOUPAC,  over-ride  the  procedure  in  an  appropriate  way  and  use  the  MATRIX  INPUT 
instruction.   To  input  a  SOUPAC  written  unformatted  tape  just  over-ride  the  pro- 
cedure . 

Over-riding  the  SOUPAC  procedure  for  using  user  data  storage  devices 

SETUP  or  not  SETUP 

Users  data  exists  on  permanently  mounted  volumes  such  as  UIDCS1  or  on  dismount- 

able  volumes  such  as  UIUSRU  or  DK0013  or  user  tapes  or  disk  packs.   In  this  context, 

a  volume  is  that  physical  disk  pack  or  tape  which  gets  mounted  onto  a  disk  drive 

or  tape  dirve.   If  a  volume  is  permanently  mounted  you  need  do  nothing  at  this  setup 

stage.   If  it  is  dismountable,  you  need  a  /^SETUP  card  for  each  such  volumne  among 

your  /*  cards,  if  any,  after  the  /*TD  card.   Tapes  always  require  a  /^SETUP  card. 

For  data  sets  on  disk  packs,  determine  from  the  user  whether  it  is  dismountable  or 

not.   The  form  of  a  /*SETUP  card  is  as  follows: 

/*SETUP_UNIT=type, ID=name 

,  ID=( name, label) . 

Type  is  DISK  or  TAPE:   the  choice  is  obvious. 

For  tapes,  name  under  ID  will  be  0XXXXX  for  an  owner  tape  where  XXXXX  is  some 

tape  sequence  number  or  PXXXXX  or  LXXXXX  where  P  and  L  are  pool  and  lease  tapes 
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rented  from  DCS. 

A  tape  may  be  internally  labeled  or  not.   If  it  is  not  labeled  the  label 
argument  is  NL,  meaning  no  label.   If  it  is  labelled  the  argument  will  be  SL, 
meaning  standard  label.   If  it  is  standard  labelled,  later  in  the  over-ride  of 
the  proc  you  will  need  to  know  the  DSNAME. 

If  no  label  argument  is  used  for  tapes  the  default  is  SL.   For  dismountable 
disk  packs  at  DCS,  the  volume  name  is  all  that  is  needed  since  they  are  all 
internally  standard  labelled.   You  must  find  out  from  a  user  if  his  tape  or  non- 
DCS  disk  pack  is  internally  labelled  or  not. 

Over-riding  the  SOUPAC  proc 

Assuming  you  now  have  the  user  volume  mounted,  you  must  still  tell  SOUPAC 
about  it.   Suppose  that  the  user  volume  is  a  sequential  data  set  (or  possibly 
more  than  one  sequential  data  set).   Usually  they  will  be.   Direct  access  data 
sets  will  be  discussed  later.   You  must  assign  it  a  SOUPAC  sequential  data  set 
reference  number.   Numbers  11  through  k9   are  available  and  correspond  to  the 
SOUPAC  addresses  SI  through  S39  (or  Tl  through  T39) .  Which  data  set  reference 
number  you  use  does  not  make  any  difference.   For  sake  of  example  let  us  choose 
SI  and  over-ride  FT11F001  in  the  SOUPAC  proc.   Our  JCL  should  look  like  this  to 
this  point : 


/*ID 

/*SETUP  UNIT=type,ID= (name,  label) 
//     EXEC      SOUPAC 

//FT11F001  DD   indication  string 
//SYSIN  DD  * 


The  indication  string  has  some  or  all  of  the  parameters  listed  below.   Use  all 
except  those  where  an  option  is  mentioned. 


UNIT=type  type  is  the  same  as  that  on  the  SETUP  card; 

if  no  setup  then  it  is  DISK 
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VOL=SER=name 


namp  is  the  same  as  that  on  the  SETUP  card; 
if  no  setup,  it  is  the  name  of  a  permanently 
mounted  volume 


LABEL=NL 
LABEL=SL 

DISP= 


DSNAME= 


same  as  label  argument  if  used  on  SETUP  card; 
if  not  used  on  SETUP  card,  can  be  omitted 

arguments  are  NEW,  OLD,  KEEP,  DELETE,  CATALG, 
UNCATALG,  PASS  singly  or  in  combination  de- 
pending on  what  you  are  doing 

some  name;  use  if  you  have  a  standard  labelled 
tape;  otherwise  optional  depending  on  what  you 
are  doing 


SPACE (kind, (primary, secondary) ) 


kind=TRK 
CYL 


primary 


secondary 


DCB=arg  l=arg  2 
DCB=(argl,  arg  2  ,  arg  3  ) 


not  necessary  for  an  OLD, 
PASS,  or  CATALG  data  set; 
necessary  for  a  NEW  data 
set;  not  necessary  for  a 
NEW  user  tape  data  set 


a  TEK  holds  7200  bytes 

a  CYL  has  20  tracks 

there  are  other  kind  parameters,  but  you  probably 

should  not  be  using  them 

number  of  kinds  of  space  units  you  initially 
request;  you  get  this  space  at  once;  if  it  is 
not  possible  to  allocate  you  get  a  space  un- 
available message 

if  you  run  off  the  end  of  your  primary  allocation 
you  get  15  extends  of  the  number  you  specify  as 
secondary  of  the  space  units  you  specify  as  kind 

if  the  machine  cannot  give  you  15  extents  or 
if  after  15  extents  you  still  need  space  you 
get  a  space  unavailable  message 

optional  depending  on  what  you  are  doing 


DCB  arguments  are  as  follows : 
RECFM= 


F    fixed  record  format  (used  for  formatted  I/O1) 
V    varying  record  format  (usually  used  for 

unformatted  i/O) 
U    undefined  record  format  (used  for  formatted 

or  unformatted  I/O) 
B    blocked  record  format 
A    alphameric  record  format;  usually  just  on  SYSOUT 


these  may  be  used  in  combination  as  in  RECFM=FB.  Note  if  B  is  used 
you  must  also  use  LRECL  and  BLKSIZE. 

LBFCL=n       n  is  the  logical  record  length  in  bytes 
BLKSIZE=m      m  is  the  blocksize  in  bytes 
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If  no  DCB  is  specified  you  get  the  OS  default  DCB.  You  may  not  necessarily 
want  this.  Ascertain  from  the  user  what  DCB  he  has  and  use  it.   For  more  in- 
formation on  DCB's  check  your  FORTRAN  users  guide.   But,  we  mention  several  pit- 
falls already  encountered  and  several  points  of  information.   SOUPAC  by  default 
uses : 

dcb=(recfm=v,lrecl=796,blksize=8oo^ 

If  you  write  a  formatted  data  set  with  DISP=PASS  you  get  RECFM=F  whether  you 
specify  it  or  not  and  SOUPAC,  unless  you  specify  DCB,  will  kick  you  off  with  the 
message  that  it  has  an  F  or  U  RECFM  but  expecting  V.   If  you  had  a  tape  written 
with  format  conversion  in  another  job  and  did  not  specify  DCB=RECFM=F  you  will 
probably  get  an  illegal  decimal  character  message.   If  you  give  SOUPAC  a  data 
set  with  DCB=RECFM=F  some  way  or  another,  SOUPAC  will  be  unable  to  write  on  it 
since  it  writes  unformatted  unless  you  use  the  MATRIX  PUNCH  instruction  and  over- 
ride FT07F001.   If  you  over-ride  FT07F001  remember  you  get  a  SOUPAC  DATA  card 
image  and  an  END#  card  image.   If  you  write  large  formatted  strings  or  many  vari- 
ables, you  should  check  on  default  LRECL  and  BLKSIZE.   If  your  record  is  larger 
than  LRECL,  you  just  spill  onto  the  next  logical  record  and  the  machine  will  worry 
about  it.   If  your  record  is  larger  than  BLKSIZE,  you  get  at  best  that  piece  of 
the  record  that  fits  into  BLKSIZE  and  the  rest  is  lost.   There  are  several  tricky 
things  that  are  going  on  here,  but  usually  you  need  not  worry  about  them.   If 
your  record  is  shorter  than  LRECL,  then  most  of  LRECL  is  unused  and  you  may  be 
wasting  space.   This  becomes  important  when  you  start  getting  space  unavailable 
or  end  of  volume  messages.   Furthermore  it  is  inefficient,  though  it  is  often  much 
more  convenient  just  to  take  the  defaults.   Again,  consult  your  FORTRAN  user's 
guide  for  more  details. 


EXAMPLES 

Write  onto  a  users  tape  a  SOUPAC  data  set  in  binary 


/*ID 

/*SETUP  UNIT=TAPE,ID=(0XXXXX,NL) 

//  EXEC  SOUPAC 

//FT12F001  DD  UNIT=TAPE,VOL=SER=0'XXXXX,LABEL=I«J,DISP=  (NEW,  KEEP) 

//SYSIN  DD   * 


TRANSFORMATIONS (I A) (T2) (13) 


Write  onto  a  users  tape  a  SOUPAC  data  set  in  EBCDIC 


/*ID 

/*SETUP  UNIT=TAPE,ID=(0XXXXX,NL) 

//  EXEC   SOUPAC 

//FT07F001  DD  UNIT=TAFE,VOL=SER=0XXXXX,LABEL=NL,DISP=( NEW, KEEP) 

//SYSIN  DD  * 


MATRIX. 

PUNCH (IA)" (format )" 

note  that  you  will  get  the  SYSPUNCH  default  DCB  which  limits  you  to  card  images, 
i.e.,  80  bytes  per  record.   For  larger  records  use  your  own  DCB. 


Read  a  users  formatted  tape  into  SOUPAC 


/*ID 

/*SETUP  UNIT=TAPE,ID=0XXXXX,SL^ 

//  EXEC  SOUPAC 

//FTU9F001  DD  UNIT=TAPE,VOL=SER=0XXXXX,LABEL=SL,DISP= (OLD, KEEP), 

DSN=USER . PYXYY . TPDATA, DCB= (RECFM=F, LRECL=80, BLKSIZE=800) 
//SYSIN  DD  * 
MATRIX . 
INPUT(T39^ (Tl) (NCOL^ " (format ) " 


note  that  you  are  reading  a  card  image  tape  with  10  card  images  per  physical 
record,  i.e.,  blocking  factor  of  10  to  1. 


Read  a  users  unformatted  tape  into  SOUPAC  written  by  SOUPAC 

/*ID 

/*SETUP     UNIT=TAPE,  ID=(0XXXXX, NL^ 

//  EXEC   SOUPAC 

//FT13F001  DD  UNIT=TAPE,VOL=SER=0XXXXX,DISP=(OLD,KEEP),LABLE=NL 

//SYSIN  DD  * 

COR(T3)(P)(P). 

note  this  presupposes  that  SOUPAC  wrote  the  tape  originally 
Read  a  users  unformatted  tape  into  SOUPAC  not  written  by  SOUPAC 

/*ID 

/*SETUP     UNIT=TAPE,ID=(0XXXXX,NL) 

//     EXEC     SOUPAC 

//FT11F001     DD     IMIT=TAPE,VOL=SER=0XXXXX,DISP= (OLD, KEEP), LABEL=NL, 

DCB=  (RECFM=V,  LRECL=U00,  BLKSIZE=i+Oi+) 
//SYSIN     DD      * 
#0LD(T1)  (nrow,i  (NCOL)  . 
MATRIX . 
MOVE(Tl) (T2^ . 

note  first  of  all  you  need  the  #OLD  card  to  simulate  a  SOUPAC  header  record. 
Further  each  record  must  contain  three  extra  integer  *h   words  at  the  front  of 
the  record  to  correspond  to  SOUPAC  records.   Also,  if  it  was  written  with  default 
DCB  you  do  not  need  to  specify  a  DCB.   Finally,  the  #OLD  card  is  good  to  use  if 
you  are  having  header  record  problems  in  SOUPAC  anyway. 

Read  a  user  data  set  from  a  dismountable  disk 

/*ID 

/*SETUP  UNIT=DISK, ID=DK0291 

//  EXEC   SOUPAC 

//FT12F001  DD  UNIT=DISK,V0L=SER=DK0291,DSN=USER.PXXX.name, 

DISP=(OLD,KEEP\DCB=(RECFM=   ,LRECL=   ,BLKSIZE=   ) 
//SYSIN  DD   * 

use  input  if  formatted;  otherwise  go  through  the  appropriate  sequence  for  un- 
formatted reads.   To  write  the  data  set,  use  the  SPACE  parameter;  change  the 
DTSP  parameter  appropriately;  and  consider  what  kind  of  write  you  want  and  fix 
the  over-ride  appropriately. 

If  you  intend  to  generate  data  in  SOUPAC  and  wish  to  use  the  data  in  any  context 
other  than  SOUPAC,  it  is  highly  recommended  that  you  do  not  simply  save  the  binary 
image  file  but  generate  a  formatted  record  data  set.   Formatted  record  data  sets 
interface  more  completely  with  other  systems. 


CORRECTION  TO  HANDOUT  ON  USING  OWNER  STORAGE  DEVICES  WITH  SOUPAC 

On  page  5  the  statements: 

LABEL=NL 
LABEL=SL 

should  read 

LABEL=(,NL) 
LABEL=(,SL) 

therefore  also  in  the  examples  every  statement  which  has  in  it  an  occurance  of 
the  phrase  LABEL=NL  or  LABEL=SL  should  have  that  phrase  changed  to  LABEI>(,NL) 
or  LABEL=(,SL),  respectively. 

For  your  information : 

LABEL=(,NL,l  or  LABEL=(,SL)  is  equivalent  to  LABEL=(l,NL)  or  LABEL=(l,SL) 
which  in  fact  means  file  1  on  the  volumne.   Thus  to  access  file  2  you  would 
write  LABEL=(2,NL)  or  LABEL=(2,  SL"! ,  or  in  general,  LABEL=(n,XX)  where  n  is 
the  file  number  and  XX  is  NL  or  SL. 
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CROSS  REFERENCE 


Statistical  Technique 
or  Estimate 

Alpha  factor  analysis 

Analysis  of  Variance 

Analysis  of  Covariance 

Autocorrelations 

Beta  Coefficients 

Biserial  Correlations 
Canonical  Correlations 
Canonical  Factor  Analysis 
Centroid  Factors 
Chi-Square 
Clique  analysis 
Communality  estimation 
Correlations 

Covariance,  Analysis  of   (See 
Analysis  of  Covariance) 

Covariance  Matrix 

Covariance  Matrix  of  Coefficients 

Covariance  Matrix  of  reduced 
form  residuals 

Cross-products  matrix 
Cross-Tabulation 


SOUP AC  Source 

ITERATIVE 

BALANOVA,  T-TEST 

T-TEST,  UMAVAC* 

AUTOCORRELATIONS 

CANONICAL,  CORRELATIONS,  MULTIPLE, 
STEP-WISE 

BISERIAL 

CANONICAL 

ITERATIVE 

CENTROID 

CANONICAL,  CHI-SQUARE,  FREQUENCY,  PROBIT 

CLIQUE 

COMMUNALITIES 

AUTOCORRELATIONS,  BISERIAL,  CANONICAL 
CORRELATIONS,  DISCRIMINANT,  MISSING, 
MULTIPLE,  OBLIMAX,  PARTIAL,  STEP-WISE 


CORRELATION,  K-CLASS ,  MULTIPLE,  STEP-WISE 
K-CLASS 

ECONOMETRIC 

CORRELATIONS,  DISCRIMINANT,  K-CLASS, 
MULTIPLE 

FREQUENCY 


*   See  SOUPAC  Consultants  about  using  UMAVAC. 


Statistical  Technique 
or  Estimate 

Data  Processing 

Determination  of  a  matrix 

Determination,  Coefficient  of 

Deviation  covariance  matrix 

Deviation  (partial)  correlation 
matrix 

Diagonal  Method  Factor  Analysis 

Dichotomous  data 

Discriminant  Function 

Discriminant  Scores 

Dispersion  Matrix 

Distribution  Analysis 

Durbin-Watson  Coefficients 

Dwyer  Extension  analysis 

Eigenvalues 

Eigenvectors 
F-Ratios 

Factor  Analysis 

Factor  Scores  (See  Factor  Analysis) 

Frequency  Distributions 

Gamma  Coefficients 

Goodman  and  Kruskal  Coefficients 

Iterative  Principal  Axis  Solution 

K-Class  Estimation 

Kolmogorov-Smirnov  Statistic 


SOUPAC  Source 

MATRIX,  STANDARD,  TRANSFORMATIONS 

MATRIX 

MULTIPLE,  STEPWISE,  K-CLASS 

MULTIPLE 

MULTIPLE 

SQUARE 

BISERIAL,  SCALOGRAM 

DISCRIMINANT 

DISCRIMINANT 

DISCRIMINANT 

FIT,  FREQUENCY,  KOLMOGOROV 

ECONOMETRIC,  MULTIPLE 

PROCRUSTES 

DISCRIMINANT,  JACOBI ,  K-CLASS 
PRINCIPAL  AXIS,  THREE-MODE 

JACOBI,  PRINCIPAL  AXIS,  THREE-MODE 

BALANOVA,  DISCRIMINANT,  MULTIPLE, 
STEP-WISE,  T-TEST 

COMMUNALITIES ,  ITERATIVE ,  OBLIMAX , 
UNRESTRICTED,  VARIMAX,  THREE-MODE 

FREQUENCY 

FREQUENCY 

FREQUENCY 

ITERATIVE 

K-CLASS 

KOLMOGOROV 


Statistical  Technique 
or  Estimate 

Lambda  Coefficients 

Latin  Squares 

Least  Squares  Factor  Fitting 

Limited  Information  Estimation 

Linear  Programming 

Means 

Missing  Data  Correlations 

Moving  Averages 

Multiple  correlation 

Noncard,  non  SOUPAC  generated  input 

Oblique  Factor  Rotation 

Orthogonal  Factor  Rotation 

Orthogonal  Simple  Structure 

Paired  Comparisons 

Partial  Correlation  Coefficients 

Phi  Coefficients 

Point  Biserial  Correlations 

Predicted  Values 

Primary  Factor  Pattern 

Principal  Components  (See  Factor 
Analysis ) 

Probit  Analysis 

Punched  Card  Output 

Quadratic  Programming 

Random  Number,  Generation  of 


SOUPAC  Source 

CANONICAL,  DISCRIMINANT,  FREQUENCY 

UMAVAC 

PROCRUSTES 

K-CLASS 

LINEAR 

CORRELATIONS,  DISCRIMINANT  K-CLASS, 
MISSING,  MULTIPLE,  T-TEST,  STANDARD 

MISSING 

STANDARD 

CANONICAL,  MULTIPLE,  STEP-WISE 

MATRIX  (INPUT  operation) 

BINORMAMIN,  OBLIMAX 

VARIMAX 

VARIMAX 

PAIRED 

PARTIAL 

Use  CORRELATIONS 

CORRELATIONS 

ECONOMETRIC,  MULTIPLE 

BINORMAMIN,  OBLIMAX 


PROBIT 
MATRIX 
QUADRATIC 
RANDOM 


Statistical  Technique 
or  Estimate 

Rank  Order  Correlations 
Recoding  Variables 
Reduced  Form  Estimates 
Reduced  Form  Predicted  Values 
Reduced  Form  Residuals 
Reference  Vector  Structure 
Regression  Analysis 
Regression  Covariance 
Residual  Analysis 
Rotation  of  Factors 
Sample  Size 

Serial  Correlations 

Spearman ' s  Rho 

Spearman ' s  C 

Spectrum  Analysis 

Square  Root  Factors 

Standard  Deviations 

Standard  Errors  of  Regression 
Coefficients 

Standardized  Data 

Standardized  Regression  Coefficients 

Stepwise  Maximum  Likelihood  Factor 

Analysis 

Stepwise  Multiple  Regression 
T-Statistic  (Student's) 


Tchuprow's  T 


SOUPAC  Source 

RANK 
TRANSFORMATIONS 

ECONOMETRIC 

ECONOMETRIC 

ECONOMETRIC 

OBLIMAX,  BINORMAMIN 

CORRELATION,  K-CLASS,  MULTIPLE,  STEP-WISE 

MULTIPLE 

ECONOMETRIC 

BINORMAMIN,  OBLIMAX  VARIMAX 

CORRELATION,  DISCRIMINANT,  FREQUENCY, 
MISSING,  MULTIPLE,  STANDARD 

AUTOCORRELATIONS 

RANK 

FREQUENCY 

AUTOCORRELATIONS 

SQUARE 

CORRELATION,  MULTIPLE,  STANDARD,  T-TEST 

K-CLASS,  MULTIPLE,  STEP-WISE 

STANDARD  SCORES 

CANONICAL,  MULTIPLE,  STEP-WISE 

ITERATIVE 

STEP-WISE 

CORRELATION,  MULTIPLE,  STEP-WISE 
T-TEST 

FREQUENCY 


Statistical  Technique 
or  Estimate 


SOUPAC  Source 


Tetrachoric  Correlations 

Three  Mode  Factor  Analysis 

Three  Stage  Least  Squares 

Two  Stage  Least  Squares 

Unrestricted  Maximum  Likelihood 
Factor  Analysis 

Variance  (See  Standard  Deviations) 

Variance,  Analysis  of  (See  Analysis 
of  Variance) 


PARTIAL 
THREE  MODE 
THREE  STAGE 
K-CLASS 

UNRESTRICTED 


DATA  AND  MATRIX  MANIPULATIONS  PACKAGE 


The  data  manipulative  programs 

As  implied  by  the  name,  TRANSFORMATIONS  is  used  for  performing 
data  transformations  often  necessary  in  "setting  up"  data  to  "be  input 
to  one  of  the  "cook-book"  programs  of  the  SOUPAC  system.   The  user  may 
create  new  variables  as  linear  combinations  or  as  algebraic  functions  of 
old  variables,  and  the  user  may  recode  or  alter  data  values  on  the  basis 
of  test  conditions  by  using  the  TRANSFORMATIONS  program.   To  perform 
matrix  algebra  operations  with  one  or  more  matrices,  to  augment  matrices  either 
row-wise  or  column-wise,  to  reorder,  save  or  delete  specific  rows  or 
columns  of  a  matrix,  one  would  use  the  MATRIX  program.   MATRIX  also  has 
the  capability  of  printing,  punching  card  decks,  and  reading  and  writing 
tape  and  disk  files  in  ways  not  available  elsewhere  in  SOUPAC. 

The  TRANSFORMATIONS  and  MATRIX  programs  have  a  unique  place  in  the 
SOUPAC  system.   Although  the  remainder  of  the  SOUPAC  library  performs 
a  large  number  of  specialized  statistical  procedures,  there  are  some 
computations  which  are  not  represented  by  a  uniquely  written  program. 
However,  by  an  imaginative  utilization  of  the  combined  powers  of 
TRANSFORMATIONS  and  MATRIX  it  is  possible  to  perform  a  virtually  unlimited 
range  of  established  and  experimental  statistical  techniques.   It  is  a 
common  practice  within  the  SOUPAC  office  to  "check  out"  newly  written 
programs  against  results  computed  by  TRANSFORMATIONS  and  MATRIX.   Conversely, 
these  two  programs  can  be  used  as  teaching  tools  by  having  students  learn 
the  step-by-step  computations  and  then  checking  the  results  against  results 
of  the  "cook-book"  programs.   Complete  multiple  regressions  and  analysis 
of  variance  programs,  for  example,  have  been  written  in  this  manner.   Since 
most  SOUPAC  jobs  require  the  use  of  TRANSFORMATIONS  and  MATRIX,  a 
familiarity  with  these  two  programs  is  basic  to  an  effective  use  of  the 
SOUPAC  system. 


MATRIX 


I.  General  Description 

The  MATRIX  program  is  a  data  manipulating  program  for  inputting 
and  outputting,  creating,  performing  matrix  algebraic  operations,  and 
generally  handling  data  matrices.   All  the  MATRIX  suboperations  are 
restricted  to  ^50  columns  (variables).   No  absolute  limit  is  set  on 
the  number  of  rows  (observations). 

Standard  SOUPAC  address  conventions  are  used  including  the  use  of 
the  character  X  to  denote  punched  output,  and  (f)  after  a  print  to 
denote  print  with  F  format.  Also  available  and  discussed  in  section  III 
below  is  the  use  of  I  for  storing  a  matrix  in  memory,  and  the  use  of  (l) 
after  a  print  to  invoke  the  MATRIX  labeling  feature.  All  other 
restrictions  are  noted  by  the  discussion  of  the  individual  suboperation 
explanations. 

II.   Parameters 

A.   Main  Parameters 

To  invoke  the  MATRIX  program,  code  the  name  MATRIX  (or  simply 
the  program  mnemonic  MAT).   There  are  two  optional  parameters 
available  which  may  be  coded  on  the  MAT  card. 

First,  if  it  is  desired  to  print,  immediately  prior  to  the 
execution  of  each  MATRIX  subparameter  operation,  the  time  in 
seconds  since  entry  into  the  MATRIX  program,  code  a  (l)  after  the 
name  MATRIX.   This  option  is  not  normally  needed  and  is  provided 
merely  for  giving  timing  estimates. 

The  second  optional  parameter  is  coded  as  a  (l)  following  the 
timing  estimate  parameter.   This  second  option  causes  the  number 
of  rows  and  columns  and  the  precision  (either  single  or  double) of 
the  answer  matrix  (for  all  suboperations  which  produce  an  answer 
matrix)  to  be  printed  out.   This  is  useful  in  debugging  a  MATRIX 
program  when  it  is  not  necessary  to  see  the  entire  answer  matrix  to 
be  printed  out,  but  it  would  be  helpful  to  check  the  dimensions  of 
an  answer  matrix.  As  in  all  SOUPAC  programs,  the  main  program 
parameter  card  must  be  terminated  by  a  period.   Examples: 

MATRIX. 

MATRIX  (1). 

MAT. 

MAT  (1) . 

MATRIX  (l)(l). 

MAT  (  )(l).   --  recommended  usage  — 
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B.   Subparameters 

Any  MATRIX  operation  may  be  invoked  by  coding  its  mnemonic 
followed  by  appropriate  subparameters.  All  operations  in  MATRIX 
handle  both  single  and  double  precision  matrices  at  the  control 
of  the  user  (see  operations  SINGLE  and  DOUBLE).   For  an  address 
not  explicitly  assigned  either  single  or  double  precision,  MATRIX 
assumes  a  default  of  double  precision  for  output  to  the  address. 
Terminate  all  subparameter  cards  with  a  period. 

To  end  a  MATRIX  program,  place  a  card  which  has  the  characters 
END  P  after  the  last  MATRIX  subparameter  card.   Since  all  MATRIX 
programs  must  have  at  least  one  subparameter  operation,  an  error 
will  be  signaled  if  a  MAT  card  is  followed  immediately  by  an 
END  P  card. 

Input  and  output  for  MATRIX  may  be  from  any  source,  however,  the 
following  rules  must  be  observed: 

1)  Never  use  CARDS  as  input  to  any  operation  except 
MOVE  unless  both  the  number  of  rows  and  the  number 
of  columns  have  been  specified  on  the  DATA  format 
card  at  the  front  of  the  data  deck. 

2)  You  may  never  output  to  PRINT  only.  All  MATRIX  output 
must  go  to  some  intermediate  storage  location  even  when 
only  printout  is  desired. 

3)  Avoid  using  the  same  address  more  than  once  on  the  same 
parameter  card  unless  otherwise  noted  in  the  description 
of  an  individual  operation.  However,  in  those  operations 
which  do  permit  using  an  address  more  than  once  as  an  input 
address,  CARDS  may  not  be  used  as  an  input  address  more  than 
once.   In  all  operations  except  INVERT,  never  specify  an 
output  address  which  is  the  same  as  an  input  address  for 
that  operation. 

k)      The  contents  of  an  input  address  remains  unchanged 

during  the  execution  of  an  operation  unless  otherwise 
noted. 

Following  is  a  description  of  the  subparameter  operations  currently 
in  the  MATRIX  program. 


Summary  of  MATRIX  Oj 

erations 
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mnemonic 

notes 

operation  name 

examples 

ABS 

Absolute  value 

ABS  ( 

SI] 

(S2). 

ADD 

1 

2,5 

Add 

ADD  ( 

SI' 

(S2)  (S3). 

ALL 

3,6 

All 

ALL  ( 
ALL  ( 
ALL  ( 

SI' 
SI' 
SI 

"GT"  *0-*  (S2). 
"NE"  *-0*  (S2)  (2). 
"LE"  (S2)  (S3)  (1,10,3) 

ANY 

3,6 

Any 

ANY  ( 
ANY  1 
ANY  1 

'SI 
'SI 

'si 

"GE"  (S2)  (S3). 

"EQ"  *1.*  (S2)  (1,2)  (5 

"LT"  *99*  (S2). 

CHO 

h 

Cholesky  decomposition 

CHO 

'si 

(S2). 

COL 

6 

Column  delete 

COL 
COL 

'si 

;si 

)  (S2)  (2). 

)  (S2)  (10)  (12,15). 

CON 

Constant  addition 

CON 
CON 

;si 
[si 

)  *2.*  (S2). 
)  (S2)  (S3). 

cou 

Count 

COU 
COU 
COU 

[si 

;si 

[si 

)  (S2). 

)  (S2)  (1). 

)  (S2)  (2). 

DIA 

Diagonal  to  vector 

DIA 

[si 

)  (S2). 

DIM 

Dimension 

DIM 

[si 

)  (S2). 

DOU 

2 

Double  precision 

DOU 

[si 

)  • 

EJE 

Eject 

EJE. 

E-D 

1, 

2,5 

Elementwise  divide 

E-D 

[si 

)  (S2)  (S3). 

E-M 

1 

2,5 

Elementwise  multiply 

E-M 

[si 

)  (S2)  (S3). 

E-R 

Elementwise  square  root 

E-R 

[si 

)  (S2). 

EXP 

Expand 

EXP 
EXP 

[si 
[si 

)  *20*  (S2). 
)  (S2)  (S3). 

FIL 

2 

File 

FIL 

[si 

)  • 

GEN 

6 

Generate 

GEN 

[si 

)  *1*. 

HOR 

1. 

2,5 

Horizontal  augment 

HOR 

[si 

)  (S2)  (S3). 

IDE 

Identity  matrix 

IDE 

[k8 

)  (si). 

INP 

Input 

INP 
INP 
INP 

[si 
[si 
[si 

)  (S2)  (  )  (9)  "(9F6.1)". 

)  (S2)  (  )  (12). 
)  (I)  (576)  (10). 

INV 

h 

Invert 

INV 
INV 
INV 

[si 

[I) 

[si 

)  (S2). 

(I). 
)  (S2)  (1)  (1)  *10.E-6*. 

LAB 

6 

Label 

LAB 

[si 

)  "SAMPLE  1"  "AGE"  "SEX". 

LAG 

Lag 

LAG 
LAG 

[si 
[si 

)  (2)  (6)  (  )  (S2). 
)  (3)  (10  (1)  (S2). 

LOW 

Lower  triangle 

LOW 
LOW 

[si 
[si 

)  (S2). 

)  (S2)  (1). 
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mnemonic  notes  operation  name 


examples 


MAX 

Maximum  value 

MAX  ( 
MAX  ( 
MAX  ( 

'si 

'SI 

SI 

)  (S2). 

)  (S2)  (1). 

)  (S2)  (2). 

MIN 

Minimum  value 

MIN  ( 
MIN  ( 
MIN  ( 

SI 
SI 
SI 

)  (S2). 
)  (S2)  (1). 
(S2)  (2). 

MOV 

2 

Move 

MOV  ( 
MOV  ( 

:c) 

SI 

(si). 

(S2). 

MUL 

1A5 

Multiply 

MUL  ( 

'SI 

>  (S2)  (S3). 

OUT 

Output 

OUT  ( 
OUT  ( 

'si 
'si 

)  (S2)  "(10E16.9)". 
)  (S2). 

PAR 

Partition 

PAR  ( 

'SI 

)  (S2)  (5)  (10)  (2)  (21). 

PER 

6 

Permutation 

PER  ( 

SI 

)  (2)  (1). 

PRI 

Print 

PRI  ( 

'si 

)  "('  ',10F13.4)". 

PUN 

Punch 

PUN  ( 

'SI 

)  "(8F10.3)". 

REC 

Reciprocal 

REC  ( 

'si 

)  (S2). 

REM 

Remap 

REM 

'si 

)  (S2)  (8). 

REW 

2 

Rewind 

REW  ( 

'si 

) . 

ROW 

6 

Row  delete 

ROW  ( 
ROW  ( 

si 
'si 

)  (S2)  (1)  (2)  (3). 

)  (S2)  (2,20,3)  (3,20,3). 

SCA 

Scalar  multiply- 

SCA  ( 
SCA  ( 

'si 

SI 

)  *.l*  (S2). 
)  (S2)  (S3). 

SIN 

2 

Single  precision 

SIN  ( 

SI 

) . 

SUB, 

1,2,5 

Subtract 

SUB  ( 

SI 

)  (S2)  (S3). 

SUM 

Sum 

SUM  ( 
SUM  ( 
SUM  ( 

SI 

'si 

SI 

)  (S2). 
)  (S2)  (1). 
(S2)  (2). 

TRA 

k 

Transpose 

TRA  ( 

SI 

(S2). 

UPP 

Upper  triangle 

UPP  ( 
UPP  ( 

SI 
SI 

(S2). 
(S2)  (1). 

VEC 

Vector  to  diagonal 

VEC  ( 

SI 

(S2). 

VER 

1,2,5 

Vertical  augment 

VER  ( 

SI 

(S2)  (S3). 

1.  Conformability  of  input  matrices  is  checked. 

2.  Up  to  twenty-one  total  addresses  may  be  used. 

3.  A  warning  message  is  printed  if  no  rows  are  output. 

k.      Any  matrix  which  has  been  previously  stored  under  the  I  address  will 
be  destroyed. 

5-   An  input  address  may  be  used  more  than  once  for  input  to  the  same 
instruction. 

6.  As  many  arguments  as  are  needed  of  the  last  argument  type  may  be  used. 
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ABSOLUTE  VALUE  (mnemonic:   ABS) 

The  ABSOLUTE  VALUE  operation  has  two  address  parameters,  an  input 
address  and  an  output  address.   The  absolute  value  of  each  element  of 
the  input  matrix  is  taken  and  the  result  goes  to  the  output  address. 
Example : 

ABSOLUTE  VALUE  (SEQ3) (SEQJO . 

ADD  (mnemonic:   ADD) 

The  ADD  operation  has  from  three  to  twenty-one  address  parameters. 
The  last  address  is  the  output  address;  all  other  addresses  are  for 
input.   Each  input  matrix  must  have  the  same  number  of  rows  and  columns 
as  all  other  input  matrices.   An  address  may  be  used  more  than  once 
as  an  input  address. 

Corresponding  elements  of  the  first  matrix  through  the  next  to 
last  matrix  are  added  together,  and  the  result  goes  to  the  output 
addre  s  s .   Example  s  : 

ADD  (SEQ1)(SEQ,U)(SEQ5). 

ADD  (SEQl)  (SEQJ+)  (SEQ2)  (SEQ3)  (SEQ5) . 

ALL  (mnemonic:  ALL) 

The  ALL  operation  performs  a  particular  test,  specified  by  the 
second  operand  as  a  relational  operator,  between  a  set  of  elements  for 
each  input  row  and  a  floating  point  number  specified  by  the  third 
operand.   If  all  elements  of  the  set  for  a  given  row  pass  the  test, 
that  row  is  output  to  the  output  address. 

The  first  parameter  is  the  input  address.   The  relational 
operator  is  enclosed  in  quotation  marks.  The  third  operand  may  be 
either  a  floating  point  number  or  an  address  in  which  case  the 
first  element  of  the  matrix  is  used  as  the  floating  point  number. 
The  six  legal  relational  operators  are  "LT",  "LE",  "EQ",  "NE", 
"GT",  and  "GE".   Remaining  (optional)  parameters  are  index 
sets  specifying  which  variables  are  to  be  included  in  the  testing. 
If  no  variables  are  specified,  all  variables  are  included  in  the 
testing.  Note  that  if  only  one  variable  is  specified,  the  results 
of  the  ANY  and  the  ALL  operation  would  be  the  same.  Examples: 

ALL  (SEQ1)  "NE"  *0.*  (SEQ2) . 
ALL  (SEQ3)  "GE"  (SEQ» (SEQl) . 

ANY  (mnemonic:  ANY) 

The  ANY  operation  performs  a  particular  test,  specified  by  the 
second  operand  as  a  relational  operator,  between  a  set  of  elements  for 
each  input  row  and  a  floating  point  number  specified  by  the  third 
operand.   If  any  element  of  the  set  for  a  given  row  passes  the  test, 
that  row  is  output  to  the  output  address. 
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The  first  parameter  is  the  input  address.   The  relational 
operator  is  enclosed  in  quotation  marks.   The  third  operand  may  be 
either  a  floating  point  number  or  an  address  in  which  case  the 
first  element  of  the  matrix  is  used  as  the  floating  point  number. 
The  six  legal  relational  operators  are  "LT",  "IE",  "EQ",  "HE",  "GT", 
and  "GE".   Remaining  (optional)  parameters  are  index  sets  specifying 
which  variables  are  to  be  included  in  the  testing.   If  no  variables 
are  specified,  all  variables  are  included  in  the  testing.  Note  that 
if  only  one  variable  is  specified,  the  results  of  the  ANY  and  the  ALL 
operation  would  be  the  same.   Examples: 


ANY  (SEQ2)  "GT"  *3-*  (SEQl). 

ANY  (SEQl)  "NE"  *-0.*  (SEQ»(l,3). 


CHOLESKY  (mnemonic:   CHO) 

The  CHOLESKY  operation  decomposes  a  square  symmetric  matrix  into 
the  product  of  an  upper  triangular  matrix  and  a  lower  triangular 
matrix  such  that  the  two  triangular  matrices  are  the  transpose  of 
each  other.   CHOLESKY  has  two  operands,  an  input  address  and  an 
output  address.   The  result  which  goes  to  the  output  address  is 
the  lower  triangular  matrix  resulting  from  the  decomposition. 

If  the  input  matrix  is  not  square,  the  "extra"  rows  or  columns 
are  ignored.   Additionally,  if  the  square  matrix  is  not  symmetric, 
the  actual  upper  triangle  of  the  matrix  is  effectively  ignored 
and  is  instead  assumed  to  be  identical  to  the  lower  tirangle. 
Example : 

CHOLESKY  (SEQl)(SEQ2). 

If  the  input  matrix  on  SEQ  1  is 

h  -2      -k 

-2      2      3 
-h  3     6 

the  resulting  matrix  output  to  SEQ  2  is 

2  0  0 
-110 
-2      11 

COLUMN  DELETE  (mnemonic:   COL) 

The  COLUMN  DELETE  operation  specifies  which  columns  of  an  input 
matrix  are  to  be  deleted  before  sending  the  result  to  the  output 
address.   The  first  parameter  is  the  input  address.   The  second 
parameter  is  the  output  address.   Columns  to  be  deleted  are 
specified  by  index  sets  following  the  output  address.  Examples: 

COLUMN  DELETE  (SEQU) (SEQ5) (k) (5) (6) (8) (13) (15) • 
COLUMN  DELETE  (SEQl) (SEQ2) (l,30,3^ • 
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CONSTANT  (mnemonic:   CON) 


The  CONSTANT  operation  has  three  parameters   A  rwir. 
number  specified  hv  +h&    o^™^      ^^^rs.   a  Iloatmg  point 

the  matrL  specified  by  th ftT T  ^  added  t0  eTery  el«»«t  <* 
the  third  opened  addresj      *  """""^  ^  reSiat  «°"  *° 

CONSTANT  (SEQ1)  *4.5*  (SEQ2). 
CONSTANT  (SEQl)(SEQ3)(SEQif). 

COUNT  (mnemonic:   COU) 

^i:°z  ropts  iSiSw operands;  an  in^ add— - - -t^ 

visaing  data,    of^acn  l^of^,5£?Lg£^f?,d?Ul,« 
option  is  equivalent  to  specifying  option  S!  SpeClf>lne  no 

coial^oA  L™ad counl  orrSetlng  ^  ""**  "  a  Sl^ 
basing  data     of  each  r^oflS  Sput^!  °f  ^^    «*•"•« 

Examples:    e±ementS'    exclu^g  missing  data   over  the   entire  matrix. 

COUNT    (SEQ1) (SEQU) . 
COUNT    (SEQ5)(SEQ3)(1). 
COUNT    (SEQ2)(SEQ3)(2). 

DIAGONAL    (mnemonic:     DIA) 

-  outeputIAaddrts?erTahei0"  t^°  ^^    m  ^   a^ess   and 
matrix  are  used  t;  fo™  T       dnlag°nal  elements   °f  the   first 
second  operSf  aldre'r  ^leY™  ^^  WMch  "   °^put  to  the 

DIAGONAL    (SEQ2)(SEQl+). 
DIMENSION    (mnemonic:     DIM) 

The  DIMENSION  operation  h^  +rm  „a* 
address,    and  an  outpS  address        ?hf  .     f  V"?"***™'    ™  input 
of  columns   of  the  input  S'  7         °f  r°WS   and  the  number 

second  elements   respJctivelv  of af  .used1to  fo™  **e  first   and 
which  is  output  to  t^'ou^tldLe^/^ief^6   ""  ^ 
DIM   (SEQ2)(SEQ1). 
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DOUBLE  (mnemonic:  DOU) 

The  DOUBLE  operation  has  anywhere  from  one  to  twenty-one 
addresses  as  parameters.   Listing  an  address  as  a  parameter  negates 
the  effect  of  any  previous  listing  of  that  address  as  a  parameter 
in  the  operation  SINGLE.   Listing  an  address  as  a  parameter  which 
has  not  appeared  as  a  SINGLE  subparameter  has  no  effect.  Example: 

DOUBLE  (SEQ1) (SEQ2) (SEQ3) (SEQU) (SEQ5) (i) . 

EJECT  (mnemonic :  EJE  ) 

The  EJECT  operation  causes  the  next  printout  to  begin  at  the 
top  of  a  new  page.  EJECT  has  no  parameters.  Example: 

EJECT . 

E -DIVIDE  —  Elementwise  Divide  —  (mnemonic:  E-D) 

The  E-DIVIDE  operation  has  from  three  to  twenty-one  address 
parameters.  The  last  address  is  the  output  address;  all  other 
addresses  are  for  input.  Each  input  matrix  must  have  the  same 
number  of  rows  and  columns  as  all  other  input  matrices  for  the  use 
of  the  operation.  An  address  may  be  used  more  than  once  as  an 
input  addre  s  s . 

Elements  of  the  second  matrix  through  the  next  to  last  matrix  are 
divided  into  the  corresponding  elements  of  the  first  matrix.  Output 
goes  to  the  last  address.   Example: 

E-DIVIDE  (SEQl) (SEQ2) (SEQ3) . 

E-MULTIPLY  --  Elementwise  Multiply  --  (mnemonic:  E-M) 

The  E-MULTIPLY  operation  has  from  three  to  twenty-one  address 
parameters.   The  last  address  is  the  output  address;  all  other 
addresses  are  for  input.   Each  input  matrix  must  have  the  same 
number  of  rows  and  columns  as  all  other  input  matrices  for  the  use 
of  the  operation.  An  address  may  be  used  more  than  once  as  an 
input  address. 

Corresponding  elements  of  the  first  matrix  through  the  next  to 
last  matrix  are  multiplied  together.   Output  goes  to  the  last 
addre  s  s .   Example : 

E-MULTIPLY  (SEQ3)(SEQ2) (l)(SEQ^). 

E-ROOT  --  Elementwise  Square  Root  —   (mnemonic:  E-R) 

The  E-ROOT  operation  has  two  address  parameters,  an  input  address 
and  an  output  address.   The  (positive)  square  root  of  each  element 
of  the  input  matrix  is  taken  and  the  result  goes  to  the  output 
addre  s  s .   Example : 

E-ROOT  (SEQl)(SEQ2). 
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EXPAND    (mnemonic :     EXP) 

of  times   specified  by  the   Lo"d  parameter  ^   "*"'"'  the  ^^ 

-se^8eCtP^Se^dbte0eJSfi0r  ?  f™  ***  -ber  in  which 

specified  by  the   integration  of     S L*653  the  number  of  times 

second  parameter  can  be   aTi™?  tL  n°atl»g  number;    or  the 

is   copied  to  the   output   Sdress  unt^X   "  J**  CaSe  the  inPut   — 

same  number  of  rows   as  the  7         ^   °UtpUt  matrix  has  the 

is  the  output   address!     E^ple":    "**  ^^      The  tMrd  P-ameter 

EXPAND    (SEQ1)(SEQ2)(SEQ3). 
EXPAND    (SEQI)*55*(SEQ4). 

FILE    (mnemonic:     FIL) 

as  parie'ters^'FI^i^3   Tf^  ^  **  t0  twen^— e   addresses 
written  :tth;  end^f "  ^NTIaTS  1  "  ^T^16  ^  ^  be 
most  useful  to  the"  uir&  llflfslfs  ^JlZ^Z*1*  *?'*** 
his   own  physical  tape.      Since   any  meanfngSl  use  o^he  FlS 
a?Irbut0tnhereqUirS  the   additi°n  °f  aPP-P-ate  IBM  3<£  JCL^ards 

FILE    (SEQ5). 
GENERATE    (mnemonic:     GEN) 

The  GENERATE  operation  generates   a   single  row  vector  with  the 
floating  point  numbers  the  user   specifies       vhTrJ     I 

GENERATE    (SEQ2^    *1.*  *2.*  *U.*  *8.*  *l6.*  *32.*. 
HORIZONTAL  AUGMENT   (mnemonic:     HOR^ 

addr^s™SsAUT\2raLi0n  ^  ^  th-e  t0  ^y-one        ■ 
other  addres^el   are  for  input       EaT"  ^  ^   addreSS'      a11 
same  number  of  rZt  U  1  ?P  ^        ?Ch  lnput  matrix  must  have  the 

address       Ex^ple:  "**  aM  the  reSult  8MS  to  «»  1-t 

HOBIZOMTAL  AUGMENT    (SEQ1)  (SEQM  (SEQ21  (i) . 
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IDENTITY  (mnemonic:   IDE) 

The  IDENTITY  operation  has  two  parameters.  An  identity  matrix, 
of  order  specified  by  a  fixed  point  number  as  the  first  operand,  is 
output  to  the  address  specified  by  the  second  operand.  Example: 

IDENTITY  (20)(SEQ3). 

INPUT  (mnemonic:   INP^ 

The  INPUT  operation  will  input  formatted  or  non-formatted 
records  from  any  available  device.   This  option  is  primarily  for 
reading  card  images  or  other  similar  data  the  user  may  have 
usually  on  his  own  tape,  which  would  be  awkward  to  input  in  the 
typical  card  deck  manner. 

Never  input  to  I  (see  SPECIAL  COMMENTS^  using  the  INPUT  operation 
unless  both  the  number  of  rows  and  the  number  of  columns  of  the 
input  matrix  are  specified  as  parameters  on  the  INPUT  operation 
parameter  card. 

The  parameters  for  INPUT  are  the  input  address,  the  output 
address,  the  number  of  rows  of  the  input  matrix  (optional  in 
most  cases^,  number  of  columns,  and  optionally  the  format  enclosed 
in  quotation  marks.   Examples: 

INPUT    (SEQ1)(SEQ2/PRINTH20U5)    "(10F8.3)". 
INPUT    (SEQ2)(SEQ3U    H8)  . 

INVERT  (mnemonic:   INV) 

The  INVERT  operation  inverts  a  non-singular  real  matrix. 
The  INVERT  operation  has  five  subparameters,  the  last  three  of 
which  are  optional.   The  first  parameter  is  the  address  of  the  matrix 
to  be  inverted,  and  the  second  parameter  is  the  output  address  of 
the  result.   (The  incore  address  option  described  in  section  III. A  - 
SPECIAL  COMMENTS  -  may  be  used  for  either  input,  output,  or  both) . 
To  have  the  determinant  of  the  original  matrix  printed  out,  code  a 
(l)  as  the  third  parameter. 

The  inversion  technique  used  is  the  Gauss-Jordan  method  with 
pivot  elements  assumed  to  be  on  the  main  diagonal.   If  it  is 
desired  that  the  inversion  technique  perform  row  and  column 
interchange,  for  the  purpose  of  picking  pivot  elements  as  those  with 
the  largest  absolute  value  at  each  step  of  the  elimination  procedure, 
code  a  (l)  as  the  fourth  parameter.   The  default  case,  pivot  elements 
assumed  to  be  on  the  main  diagonal,  executes  faster  than  when  row 
and  column  interchange  is  performed.   For  those  real  symmetric 
matrices  which  have  the  property  that  the  largest  elements  are 
necessarily  on  the  main  diagonal  (e.g.  correlation,  cross-products, 
variance-covariance  matrices^  numerical  accuracy  of  the  results  is 
not  significantly  different  between  the  two  options.  For  general 
matrices  in  which  specific  properties  are  not  known,  using  row  and 
column  interchange  will  probably  produce  more  accurate  results. 
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value  of  any  pi^t  element  IsZll^^"1^7-     If  the  abs°lu*« 

*.  nT  LpaLL s- rL\^nth:0™  «  -  -~ty, 
m^ss-HorST^s  s^sr- address 


AX  =  Y 


input  to  the  INVERT  suboperation  a  matrix  which  contains  A -v  U 
the  constant  term  appearing  as  the  last  column^  variables)  Al 
resulting  output  of  the  INVERT  suboperation  will  be 


A-1:X 


The  Y  above  may  be  more  than  one  column  vector  in  which  case  each 
resulting  column  vector  of  X  will  be  the  solution  fv£  +>,  . 

column  of  Y.  Examples:  solution  for  the  corresponding 

INVERT  (SEQ1)(SEQ2). 

INVERT  (SEQif)(SEQ3)(D(l). 

INVERT  (SEQ2)(SEQU)(1)(  )  *10.E-5*. 

INVERT  (SEQ5)(SEQ1)(  ) (l)  *. 0000001*. 

LABEL  (mnemonic:  LAB) 

a  sohpa^S  °peration  is  used  to  store  a  title  and  column  labels  at 

is  limited  to6128  ch   ^  ^  ^   *  mTRIX  Pr°^-   The  title 
each  characters.   Labels  are  limited  to  eight  characters 

The  first  parameter  is  the  address  where  the  title  and  labels 
are  to  be  stored.   This  is  then  followed  by  the  title  and  labels 

"v  oneCtimf  ^  qUOtat±°n  ^^   °nly  <**  label  set  S  a^  at 
tX?       \   HenCe'  Sach  USe  of  LABEL  overrides  all  previous  uses 

progrLf  ne(naotee.WiThin-a  "™  »«*«»  -7  «*  be  passed%Toth  r' 
programs.   (note:   The  mcore  address  option  may  be  used  to  store  a 
title  and  label  set  if  desired.)  Examples: 

LABEL  (SEQ»  "TITLE"  "LABEL  1"  "LABEL  2"  "LABEL  V 
LABEL  (SEQ1)  "ONLY  A  TITLE;  NO  LABELS".  ' 

LAG  (mnemonic:  LAG) 

The  LAG  operation  has  five  operands;  an  input  address  an 
integer  specifying  which  variable  is  to  be  lagged  an  inteSr 
specifying  the  number  of  lag  periods,  an  inteffr  option  fS 
and  an  output  address.  integer  option  flag, 
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If  the  option  flag  is  zero,  intermediate  lags  are  included 
in  the  output  matrix;  if  the  option  flag  is  non-zero,  intermediate 
lags  are  not  included. 

For  example,  suppose  we  are  interested  in  lagging  the 
fourth  variable  of  a  five  variable  matrix  and  suppose  we  want 
three  lag  periods.  First  it  should  be  noted  that  the  resulting 
output  matrix  will  necessarily  have  three  fewer  rows  (observations) 
than  the  input  matrix. 

If  we  use 

LAG(S1)(10(3)(0)(S2). 

th 
the  resulting  t   row  to  the  output  address  would  be 

Xt,l  Xt,2  Xt,3  Xt,^  Xt,5  Xt-3,^  Xt-2,l+  Xt-1,^ 
If  we  use 

LAG(S1)(U)(3)(1)(S2). 

th 

the  resulting  t   row  to  the  output  address  would  be 

Xt,lXt,2  Xt,3  Xt,5  Xt-3,^ 

Note  that  the  difference  between  the  option  flag  being  zero 
and  non-zero  is  that  the  resulting  output  matrix  has  W-l 
fewer  lags  where  N  is  the  number  of  lag  periods  (i.e.  the 
intermediate  lags  are  not  included  in  the  output). 

As  a  concrete  example  consider  the  following  input  on  SI. 


1 

2 

3 

k 

5 

6 

7 

8 

9 

10 

11 

12 

13 

Ik 

15 

16 

17 

18 

Use  of  the  statement 

LAG(S1)(2)(2)(  )(S2). 
will  result  in  the  following  matrix  to  be  output  to  S2. 


7 

8 

9 

2 

5 

10 

11 

12 

5 

8 

13 

ll+ 

15 

8 

11 

16 

17 

18 

11 

Ik 

Use  of  the  statement 

LAG(S1)(2)(2)(1)(S2) 
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will  result  in  the  following  matrix  to  be  output  to  S2. 


7 

8 

9    2 

10 

11 

12    5 

13 

11+ 

15     8 

16 

17 

18    11 

LOWER  TRIANGLE  (mnemonic: 

LOW) 

The  LOWER  TRIANGLE  instruction  copies  a  matrix  from  one 
address  to  another  and  sets  all  elements  which  are  above  the 
main  diagonal  to  zero.   It  is  possible  to  indicate  if  it  is 
desired  that  the  main  diagonal  elements  also  be  set  to  zero. 

The  LOWER  TRIANGLE  instruction  has  three  operands;  an 
input  address,  an  output  address,  and  an  integer  option  flag. 
If  the  option  flag  is  omitted  or  is  zero,  the  main  diagonal 
elements  are  included  as  part  of  the  lower  triangle.   If  the 
option  flag  is  non-zero,  the  main  diagonal  elements  are  set  to 
zero.  Examples: 

LOWER  (SI) (S3). 
LOWER  (S2)(Si+)(l). 

MAXIMUM  (mnemonic:  MAX) 

The  MAXIMUM  operation  has  three  operands;  an  input  address,  an 
output  address,  and  an  option  indicator. 

If  option  0  is  specified,  the  resulting  output  matrix  is  a  single 
row  vector  containing  the  maximum  element  of  each  column  of  the 
input  matrix.   Specifying  no  option  is  equivalent  to  specifying  option 
0. 

If  option  1  is  specified,  the  resulting  output  matrix  is  a  single 
column  vector  containing  the  maximum  element  of  each  row  of  the 
input  matrix. 

If  the  option  is  specified  as  any  number  other  than  0  or  1,  a 
single  element  matrix  is  output  which  contains  the  maximum  element 
of  the  entire  matrix.  Examples: 

MAX  (SEQ1)(SEQ3). 
MAX  (SEQ1)(SEQ2)(2). 

MINIMUM  (mnemonic:  MEN) 

The  MINIMUM  operation  has  three  operands;  an  input  address,  an 
output  address,  and  an  option  indicator. 

If  option  0  is  specified,  the  resulting  output  matrix  is  a 
single  row  vector  containing  the  minimum  element  of  each  column 
of  the  input  matrix.   Specifying  no  option  is  equivalent  to 
specifying  option  0. 

If  option  1  is  specified,  the  resulting  output  matrix  is  a 
single  column  vector  containing  the  minimum  element  of  each  row 
of  the  input  matrix.  Examples: 

MEN  (SEQ1)(SEQ3). 
MEN  (SEQ1)(SEQ2)(2). 
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MOVE  (mnemonic:  MOV) 

The  MOVE  operation  moves  (actually  copies)  a  matrix  from  one 
SOUPAC  standard  input  source  to  another.   If  reading  from  SEQUENTIAL, 
the  MOVE  operation  assumes  that  the  data  set  was  created  using  SOUPAC 
conventions,  i.e.  by  some  SOUPAC  program.   If  the  input  source  is 
CARDS,  the  input  deck  must  be  preceded  by  a  correct  DATA  format 
statement  and  terminated  by  an  END#  card. 

Never  MOVE  from  CARDS  to  I  (see  SPECIAL  COMMENTS)  unless  both 
number  of  rows  and  number  of  columns  of  the  input  matrix  are 
specified  at  the  front  of  the  data  deck. 

The  operation  has  between  two  and  twenty-one  addresses  as 
parameters.   The  first  address  is  the  input  address.   All  remaining 
addresses  are  output  addresses.   Examples: 

MOVE  (CARDS) (SEQ1) (SEQ2) . 
MOVE  (CARDS) (SEQl). 

MULTIPLY  (mnemonic:  MUL) 

The  MULTIPLY  operation  has  three  addresses  for  parameters.  A 
matrix  multiplication  is  performed  between  the  matrices  on  the  first 
two  addresses  and  the  result  is  stored  in  the  third  address.   The 
MULTIPLY  operation  permits  use  of  the  same  address  to  be  used  as  an 
input  address  for  both  first  and  second  operands.   This  usage  is 
equivalent  to  using  the  SQUARE  suboperation. 

The  incore  address  option  may  not  be  used  for  input  of  the  first 
operand  if  the  first  operand  is  different  from  the  second.   The 
incore  address  option  may  never  be  used  for  output  of  the  result. 
The  MULTIPLY  operation  destroys  any  matrix  which  has  been  stored 
in  core  using  the  incore  address  option  (see  section  III. A  -  SPECIAL 
COMMENTS').   All  calculations  are  done  in  double  precision.  Examples: 

MULTIPLY  (SEQ1)(SEQ2)(SEQ3). 
MULTIPLY  (SEQl)(SEQl)(SEQ2). 
MULTIPLY  (SEQ2) (l) (SEQ5) . 

OUTPUT  (mnemonic:   OUT"i 

The  OUTPUT  operation  outputs  a  matrix,  a  row  at  a  time  with  or 
without  format  control,  to  a  user  specified  data  set.   If  format 
control  is  used,  the  output  address  specified  should  not  be  used 
anywhere  else  in  the  current  SOUPAC  job  step  except  with  options 
which  also  perform  formatted  i/O  (e.g.  the  INPUT  and  OUTPUT  operations 
of  MATRIX).   The  syntax  of  OUTPUT  is  two  addresses,  input  and 
output  addresses  respectively,  optionally  followed  by  the  desired 
format  enclosed  by  quotation  marks.  Examples: 

OUTPUT  (SEQ1)(SEQ5^  "(20EI5.7)". 

OUTPUT  (SEQ2)(SEQ3^. 
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PARTITION  (mnemonic:   PAR) 


The  PARTITION  operation  is  used  to  select  a  sub-matrix  of  an 
original  input  matrix.   The  first  operand  is  the  input  address 
and  the  second  operand  is  the  output  address.   The  next  four 
parameters  specify  in  order,  the  beginning  column  of  the  partition, 
the  ending  column  of  the  partition,  the  beginning  row  of  the  partition, 
and  the  ending  row  of  the  partition.   If  either  beginning  parameter  is 
left  out,  the  partition  begins  with  the  first  row  (or  column).   If 
either  ending  parameter  is  left,  the  partition  ends  with  the  last 
row  (column).  Examples: 

PARTITION  (SEQ5) (SEQ2) (5) (6) (2) (50) 
PARTITION  (SEQU)(SEQ2)(3)(^0). 

PERMUTATION  (mnemonic:   PER) 

The  PERMUTATION  operation  permutes,  on  option,  rows  or  columns 
or  rows  and  columns  of  an  input  matrix.   The  resulting  matrix  is 
output  to  the  second  operand  output  address. 

The  third  operand  is  an  option  flag.   If  the  option  flag  is 
specified  as  zero,  columns  of  the  matrix  are  permuted.   If  the  option 
flag  is  specified  as  one,  rows  of  the  matrix  are  permuted.   If  the 
option  flag  is  specified  as  other  than  zero  or  one,  both  rows  and 
columns  are  permuted. 

The  order  of  columns  or  rows  of  the  output  matrix  is  determined 
by  index  sets.  Examples: 

If  the  statement 

PERMUTE  (Sl)(S2)(0)(5)(l,^). 

is  used  and  SI  has  the  matrix 

1.    2.    3.    k.  5. 

-1.    0.    2.      .1    .3 

the  resulting  output  matrix  will  be 

5.    1.    2.    3.    h. 
.3  -1.    0.    2.    1. 

If  the  input  matrix  had  been 

1.    2.    3.    k.         5-    6. 
-1.    0.    2.     .1    .3   k. 

the  resulting  output  matrix  would  be  the  same  as  the  output  matrix 
already  listed.  Note  that  this  implies  that  only  those  columns  or 
rows  which  are  explicitly  listed  will  be  output. 

If  the  statement 

PER  (Sl)(S2)(l)(3)(2Hl)- 
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is  used  and  the  matrix  on  SI  is 

12  3 

k         5         6 
7         8         9 

the  resulting  output  matrix  will  be 

3    2    1 

6  5    h 

9    8    7 

If  the   statement 

PER   (S1)(S2)(2)(3)(2)(1). 
is  used  on  the  matrix 

12  3 

k        5         6 

7  8        9 

the  resulting  output  matrix  will  be 
9    8    7 

6   5  h 

321 
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PRINT  (mnemonic:   PRI 


The  PRINT  operation  prints  out  a  matrix,  one  row  at  a  time,  under 
the  control  of  a  user  supplied  format.  Formats  follow  FORTRAN  IV 
conventions  with  the  added  restriction  that  formats  are  limited  to 
592  characters. 

The  first  parameter  is  the  address  of  the  matrix  to  be  printed. 
The  second  parameter  is  the  format  enclosed  in  quotation  marks. 
(Warning:  Allow  for  carriage  control  as  the  first  character  in 
output  lines.  A  print  line  has  133  characters.)  Example: 

PRINT  (SEQ2)  "('  ',3F20.10)". 

PUNCH  (mnemonic:   PUN) 

The  PUNCH  operation  has  the  same  syntax  as  PRINT  and  is  used  to 
punch  out  a  matrix  under  the  control  of  a  user  supplied  format. 
(Warning:  When  punching  cards,  remember  that  there  is  room  for  only 
80  characters  per  card) . 

The  PUNCH  operation  always  punches  two  cards  in  addition  to  the 
actual  data  deck.  At  the  front  of  the  data  is  punched  a  DATA  format 
card,  and  at  the  end  of  the  data  is  punched  an  END#  card.  Example: 

PUNCH  (SEQ1)"(8F10.2)". 

RECIPROCAL  (mnemonic:   REC) 

The  RECIPROCAL  operation  has  two  operands,  an  input  address  and 
an  output  address.   The  reciprocal  of  the  elements  from  the  first 
matrix  are  used  to  form  an  output  matrix  which  is  output  to  the 
second  operand  address.   Example: 

RECIPROCAL  ( SE01 ) ( SEQ3 ) • 
REMAP  (mnemonic:  REM) 

The  REMAP  instruction  is  used  when  a  row  of  the  input 
matrix  represents  several  observations  and  it  is  desired  to 
"break  up"  the  input  row  into  several  shorter  output  rows. 

The  REMAP  instruction  has  three  operands;  an  input  address, 
an  output  address,  and  an  integer  which  is  to  be  used  as  the 
column  dimension  of  the  output  address. 

For  example,  if  we  have  the  row  of  data  on  SI 
1.     2.     3.     k.  5-     6. 

and  use  the  instruction 
REMAP  (S1)(S2)(2). 

the  resulting  output  to  S2  would  be 


1.  2. 
3.  k. 
5.    6. 


Notice  that  the  column  dimension  to  be  specified  for  the 
output  matrix  must  exactly  divide  the  column  dimension  of  the 
input  matrix. 
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REWIND  (mnemonic:  REW) 

The  REWIND  operation  has  anywhere  from  one  to  twenty-one  addresses 
as  parameters.   REWIND  is  used  to  rewind  a  sequential  file.   The 
REWIND  operation  needs  only  to  be  used  with  the  INPUT  operation  when 
it  is  desired  to  reread  a  formatted  input  file.  Example  (to  input 
the  same  formatted  file  from  SEQ  3  onto  both  SEQ  1  and  SEQ,  2  under 
control  of  different  formats) : 

INPUT    (SEQ3) (SEQ1) (    ) (5)    " (10X, 5F10.0)" . 

REWIND    (SEQ3). 

INPUT    (SEQ3)(SEQ2)(    ) (8)    "(8F10.0)". 

ROW  DELETE  (mnemonic:  ROW) 

The  ROW  DELETE  operation  specifies  which  rows  of  an  input  matrix 
are  to  be  deleted  before  sending  the  result  to  the  output  address. 
The  first  parameter  is  the  input  address.   The  second  parameter  is 
the  output  address.   Rows  to  be  deleted  are  specified  by  index 
sets  following  the  output  address.   Example: 

ROW  DELETE  (i) (SEQ3) (l) (3) (7) (8) (ll) (k^) . 
SCALAR  (mnemonic:   SCA) 

The  SCALAR  operation  has  three  parameters.   A  floating  point 
number  specified  by  the  second  operand  is  multiplied  by  every  element 
of  the  matrix  specified  by  the  first  operand.   The  result  goes  to 
the  third  operand  address. 

The  second  operand  may  be  either  a  floating  point  number  enclosed 
in  asterisks  or  a  standard  SOUPAC  input  address.   If  an  address  is 
specified,  the  first  element  of  the  matrix  at  the  address  is  used 
for  the  floating  point  number.  Examples: 

SCALAR  (SEQ1)  *2.*  (SEQ2) . 
SCALAR  (SEQ>)(SEQ3)(SEQ2). 

SINGLE  (mnemonic:   SIN) 

The  SINGLE  operation  has  anywhere  from  one  to  twenty-one  addresses 
as  parameters.   Listing  an  address  as  a  parameter  causes  any  matrices 
written  on  that  address  to  be  written  in  single  precision.  MATRIX 
stores  all  data  matrices  in  double  precision  unless  the  user  specifies 
otherwise  with  the  suboperation  SINGLE.   The  listing  of  an  address  in 
a  SINGLE  statement  in  one  MATRIX  program  does  not  carry  over  in  effect 
to  any  other  MATRIX  program.  Example : 

SINGLE  (SEQ1)(SEQ2)(SEQ». 
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SUBTRACT  (mnemonic:   SUB) 

The  SUBTRACT  operation  has  from  three  to  twenty-one  address 

ITeToTlL^VT .addrS\is  the  ™^   a^ess;  all  other  addresses 
are  for  xnput.   Each  input  matrix  must  have  the  same  number  of  rows 
and  columns  as  all  other  input  matrices  for  the  use  of  the  operation 
Elements  of  the  second  matrix  through  the  next  to  last  matrix  are 
subtracted  from  corresponding  elements  of  the  first  matrix.   Output 
goes  to  the  last  address.  An  address  may  be  used  more  than  once  as 
an  input  address.  Examples: 

SUBTRACT  (SEQl)(SEQ3)(SEQl+). 

SUBTRACT  (SEQlO  (SEQ2) (SEQ3) (SEQl) (SEQ5) . 

SUM   (mnemonic:   SUM) 

The  SUM  operation  has  three  operands;  an  input  address,  an  output 
address,  and  an  option  indicator. 

If  option  0  is  specified,  the  resulting  output  matrix  is  a  single 
row  vector  containing  the  column  sum  of  each  column  of  the  input  matrix. 
Specifying  no  option  is  equivalent  to  specifying  option  0. 

If  option  1  is  specified,  the  resulting  output  matrix  is  a  single 
column  vector  containing  the  row  sum  of  each  row  of  the  input  matrix. 

If  the  option  is  specified  as  any  number  other  than  0  or  1,  a 
single  element  matrix  is  output  which  contains  the  sum  of  all  elements 
over  the  entire  matrix.   Examples: 

SUM  (SEQ1)(SEQ3). 
SUM  (SEQ1)(SEQ2)(2). 

TRANSPOSE    (mnemonic:      TRA) 

and'colZ!)!^  a  ma^ix    (interchanges  rows 

address   storLe       ™f  f  destr°ys   an^  Previous  usage   of  the  incore 
imS  If       g   \  tW°  Parame*ers  for  TRANSPOSE   are  first  the 

xnput   address   and  second  the   output   address  of  the  result       Example: 

TRANSPOSE    (SEQ1)(SEQ2). 

UPPER  TRIANGLE  (mnemonic:  UPP) 

addSA™  "V-f-"-  -Pies  a  matrix  from  one 

are   included  as  Zlf   tr  V   Mr°'    the  min  diaS°*al  elements 

UPPER    (S1)(S3^. 
UPPER    (S2)(S4)(1). 
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VECTOR  (mnemonic:  VEC) 


The  VECTOR  operation  has  two  operands,  an  input  address  and  an 
output  address.   A  single  vector  from  the  first  location  is  used 
to  form  a  diagonal  matrix  which  is  output  to  the  second  address. 
If  the  input  matrix  has  more  rows  than  columns,  the  first  column 
vector  is  used  to  form  the  diagonal  matrix.   If  the  input  matrix 
has  more  columns  than  rows,  the  first  row  is  used  to  form  the 
diagonal  matrix.  Example: 

VECTOR  (SEQ1)(SEQ3). 


VERTICAL  AUGMENT  (mnemonic:  VER) 

The  VERTICAL  AUGMENT  operation  has  from  three  to  twenty-one 
address  parameters.   The  last  address  is  the  output  address;  all 
other  addresses  are  for  input.   Each  input  matrix  must  have  the 
same  number  of  columns  as  all  other  input  matrices  for  the  use  of 
the  operation.  An  address  may  be  used  more  than  once  as  an  input 
addre  s  s . 

Input  matrices  from  the  first  address  through  the  next  to  the  last 
address  are  stacked  top  to  bottom  and  the  result  goes  to  the  last 
address.  All  input  matrices  must  have  the  same  number  of  columns. 
Example : 


VERTICAL  AUGMENT  (SEQl) (SEQ2) (SEQ3! 
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Addendum  to  the  MATRIX  Program  Description 


KRQNECKER  PRODUCT  (mnemonic :   KRO) 

The  KRONECKER  PRODUCT  operation  forms  the  Kronecker  Product 
of  two  matrices  and  outputs  the  results  to  an  output  address. 
The  resulting  output  matrix  is  composed  of  m-j_  x  n,  submatrices 
where  m-[_  and  n-|_  are  the  dimensions  of  the  first  input  matrix. 
Each  submatrix  has  the  size  mp  x  n2  where  va.2   and  n2  are  the 
dimensions  of  the  second  input  matrix.   Note  that  the  output 
matrix  has  the  dimensions  mj_m2  x  n]_n2-   Each  submatrix  is  the 
result  of  the  scalar  product  a. .  B. 

For  example,  if  matrix  A  is  oh  SI  and  B  is  on  S2  where 

1.     2  13 

A=  -1     1  B= 

0    .5  2    k 

the  result  of  executing  the  MATRIX  statement 

KRONECKER  (SI ) (S2) (S3 ) • 

would  be  the  following  matrix  on  S3. 


1 

3 

2 

6 

2 

k 

It 

8 

-1 

-3 

1 

2 

-2 

-k 

3 

k 

0 

0 

.5 

1.5 

0 

0 

1 

2 

RECIPROCAL  OF  SQUARE  ROOT  OF  nTAflONAT,  (mnemonic:   RSD) 

The  RSD  operation  has  two  operands,  an  input  address  and  an 
output  address.   The  reciprocal  of  the  square  root  of  the  main 
diagonal  elements  from  the  first  matrix  are  used  to  form  a  single 
row  vector  which  is  output  to  the  second  operand  address.   Example: 

RSD  (SEQ1)(SEQ3). 


These  programs  were  designed  after  the  bulk  of  the  MATRIX  program 
description  was  typed. 
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II.  Special  Comments 

A.   Incore  Address  Option 

Besides  the  standard  SOUPAC  addresses,  MATRIX  also  recognizes 
the  additional  address  I.  The  I  symbol  as  an  address  represents 
internal  storage  in  the  machine . 

An  obvious  use  of  this  feature  is  to  cut  down  on  I/O  time  for 
matrices  which  are  to  be  used  in  future  operations  within  the 
current  matrix  program.   The  internal  storage  feature  also  saves 
time  when  the  user  desires  his  output  from  an  operation  to  be 
printed  or  punched.   The  user  must  keep  in  mind  that  data  cannot 
be  passed  to  subsequent  programs  with  the  I  storage.   The  user 
should  also  be  aware  of  the  restrictions  on  I  storage  as  mentioned 
above  in  some  of  the  subparameter  operations  (see  INVERT,  MULTIPLY, 
SQUARE,  and  TRANSPOSE).   In  all  cases  the  use  of  this  option  is  not 
recommended  for  matrices  which  do  not  fit  within  the  memory  available 
to  the  MATRIX  program  while  running  within  any  particular  region  size, 

l)   To  add  the  matrix  on  SEQl  to  the  matrix  on  SEQ2  leaving 
the  result  in  core  and  also  printing  the  result,  code  as 
follows : 


ADD  (SEQl)(SEQ2)(l/PRINT) 

it   the  matrices   in  core,    oi 
; suit   on  SEQU,    code   as   fcxL 

VERTICAL  AUGMENT  (i) (SEQl) (SEQ2) (SEQ» 


To  vertically  augment  the  matrices  in  core,  on  SEQl  and  on 
SEQ2,  storing  the  result  on  SEQ4,  code  as  follows: 


Labeled  Output 

Provided  in  the  MATRIX  program  is  the  facility  to  title  and  put 
column  labels  on  any  matrix  which  is  printed  using  normal  SOUPAC 
print  conventions.   The  labeling  feature  is  not  allowed  with  the 
PRINT  matrix  operation. 

To  use  the  labeling  feature,  it  is  first  necessary  to  put  the 
title  and  labels  in  a  temporary  storage  area.   This  is  accomplished 
with  the  LABEL  operation  (see  Subparameter s) . 

To  use  a  label  which  has  been  placed  in  a  temporary  storage  area, 
code  (L)  after  the  print  portion  of  the  output  address.   If  F 
format  is  also  desired  code  either  (F,L)  or  (L,F)  after  the  print. 
Example  s : 

1)  To  move  (copy)  the  matrix  on  SEQl  onto  SEQ5  printing  the 
result  in  F  format  with  title  and  column  labels,  code  as 
follows : 

MOVE  ( SEQl) (SEQ5/PRINT (F, L) ) . 
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2)  To  add  the  matrices  on  SEQ1  and  SEQ2  storing  the  result  on 
SEQ3  and  also  printing  the  result  with  title  and  column 
labels,  code  as  follows: 

ADD  (SEQ1)(SEQ2)(SEQ3/PRINT(L)). 

3)  To  transpose  the  matrix  stored  on  SEQ1  to  SEQ2,  printing 
out  the  result  in  F  format  with  title  and  column  labels, 
and  punching  out  a  card  deck  of  the  transposed  matrix, 
code  as  follows : 

TRANSPOSE  (SEQ1)(SEQ2/PRINT(L,F)/X). 


TRANSFORMATIONS 
I.   General  Description 

TRANSFORMATIONS  is  a  data  manipulation  program.   Unlike  MATRIX,  which  performs 
operations  on  a  complete  matrix,  TRANSFORMATIONS  operates  upon  matrices  one  row 
at  a  time.   This  strategy  provides  for  almost  unlimited  flexibility  in  trans- 
forming your  data.   Some  of  the  general  uses  include  creating  new  variables  as 
functions  of  present  variables,  recoding  or  collapsing  data,  and  reordering  or 
eliminating  variables.   More  advanced  uses  are  facilitated  by  an  instruction 
set  which  allows  testing  and  branching  depending  on  single  variable  character- 
istics or  relations  between  variables,  indirect  addressing  (FLAG-NOTATION),  and 
inputting  and  outputting  to  and  from  different  sequential  units  during  the  program, 

TRANSFORMATIONS  serves  several  purposes  in  the  SOUPAC  system.   First,  it 
can  be  used  as  a  stand  alone  program  to  perform  computations  on  your  input  data 
and  yield  the  final  results.   Also,  it  can  be  used  to  prepare  your  data  for 
input  into  another  SOUPAC  program  or  to  make  modifications  from  the  output  of 
one  program  for  input  into  another. 

II.   Parameters 

Parameter 
Numhpr  Use  or  Meaning 

1  Input  address  of  the  main  input  matrix. 

CARDS  or  SEQUENTIAL  1-15-   This  parameter  can 
be  left  out  if  there  is  no  input  or  if  input 
comes  from  the  input  instruction. 

The  format  of  the  entire  TRANSFORMATIONS  program  follows : 

TRANSFORMATIONS  (C  or  Sn,  n<15). 

'.      subparameter  cards  which  describe  the  transformation  to  be  performed 

END  PROGRAM 

III.   Programming  Considerations 

The  TRANSFORMATIONS  program,  as  described  in  the  first  section,  reads  in  one 
row  of  data  at  a  time  and  then  it  executed  the  program  until  the  end  or  until 
a  last  card  instruction,  and  then  it  cycles  on  the  identical  program  for  each 
row  of  data. 

There  are  2000  variables  allowed  in  the  TRANSFORMATIONS  program.   Before 
each  row  of  data  is  read  into  the  program  variables  1  through  1000  are  filled 
with  zeroes.   Variables  1001  through  2000  are  zeroed  out  only  before  the  program 
begins  so  that  during  the  program  they  can  be  used  for  accumulating  sums  or 
related  totals  as  well  as  being  used  for  temporary  storage  variables  for  row 
manipulations . 

Input  to  the  program  can  be  specified  by  the  parameter  on  the  TRANSFORMATIONS 
card  or  by  the  INPUT  instruction.   Output  can  only  be  achieved  by  use  of  the 
OUTPUT  instruction. 

IV.   Semi-table  of  Contents 

Section  5  -  Subparameter  List  Section  9  -  DO-notation 

Section  6  -  Labels  Section  10-  FLAG-notation 

Section  7  -  Subparameter  Descriptions  Section  11-  A  TRANSFORMATIONS  Example 

Section  8  -  EBCDIC  Tables  Section  12-  Notes  and  Ideas 


V.   Subparameter  List 
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mnemoni 

c   notes 

operation  name 

exami 

Dies 

ABO 

Abort 

ABORT. 

ABS 

1 

Absolute  Value 

ABS 

ID(21). 

ADD 

1,2 

Add 

ADD 
ADD 

Cl)(2)(2l). 

(D(3)(5)(T)(22). 

ANG 

1 

Angle  to  Radians 

ANG 

(D(21). 

A-C 

1 

Arccosine 

A-C 

(D(21). 

A-S 

1 

Arcsine 

A-S 

(D(21). 

A-T 

1 

Arctangent 

A-T 

(D(21). 

C-G 

Computed  Go  To 

C-G 

(l00)"A""B""C""D". 

CON 

3 

Constant 

CON 
CON 

(101)*!+. 3*. 
(102)(7). 

COS 

1 

Cosine 

COS 

(D(21). 

DIF 

2 

Difference  if 

DIF 

(1)(5)"A""B""C". 

DIV 

1,2 

Divide 

DIV 

DIV 
DIV 

(D(2)(21). 

(D(2)(21)(3). 

(1)(2)(21)(3)"A". 

EBC 

1 

EBCDIC 

EBC 

(D(21). 

EXC 

1 

Exchange 

EXC 

(1)(2). 

EXI 

Exit 

EXIT 

EXP 

1 

Exponent  Base  e 

EXP 

(D(21). 

FAC 

1 

Factorial 

FAC 

(D(21). 

FIX 

1 

Fixed  Point  Conversion 

FIX 

(D(21). 

FLO 

1 

Floating  Point  Conversion 

FLO 

(D(21). 

GO 

Go  to 

GOTO 

"A". 

IF 

Arithmetic  If 

IF  (1)"A""BM"C". 

INP 

Input  from  Unit 

INPUT  (S2)(200). 

LAS 

Last 

LAST 

ELO 

1 

Log  base  e 

ELOG 

(D(21). 

LOG 

1 

Log  base  10 

LOG 

(D(21). 

MAX 

1,2 

Maximum  Value 

MAX 

(D(2)(21). 

MIN 


1,2 


Minimum  Value 


MAX  (1)(3)(5)(7)(9)(22) 

MIN  (l)(2)(2l). 

MIN  (1)(3)(5)(7)(9)(22) 


MOD 

1,2 

Modular  Arithmetic 

MOD  (1)(5)(21). 

MOV 

1 

Move 

MOVE  (l)(2l). 
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mnemonic   notes   operation  name 


examples 


MUL 

NO 

OUT 

PER 

RAD 

RAI 

REC 

SIG 
SIN 
SKI 

SQU 

SUB 


1,2  Multiply 

No  operation 

3  Output  to  Unit 

3  Permut  e 

1  Radians  to  Angle 

1,2  Raise  to  Power 

1,2  Recode 

1  Sign  Transfer 

1  Sine 

2  Skip  record  on  unit 

1  Square  root 

1,2  Subtract 


MUL  (l)(2)(2l). 
MUL  (1)(3)(5)(22). 

NOOP. 

OUT  (S3)(l,50). 

PER  (100)(5)(1,M(8,9)(7). 

RAD  (l)(2l). 

RAI  (1)(2)(21). 

REC  (1)"GTM(2)(3)*11*. 

REC  (7)"EQ"*3*(7)*0*(7)*1*. 

SIGN  (l)(22). 

SINE  (l)(2l). 

SKIP  (S2)(100). 
SKIP  (S3)*l*. 

SQU  (l)(2l). 

SUB  (1)(2)(21). 


XAD  -   add 

XDI   -   divide   These  fixed  point  instructions  are  identical  in  operations 
.f       performed  and  format  with  the  corresponding  floating  point 
instructions,  except  that  all  variables  appearing  in  these 
XMU  -  multiply  instructions  must  be  in  fixed  point  representation. 

XSM  -   sum 

XSU  -  'subtract  


2. 


For  all  parameters  containing  variables  or  floating  point  constants,  DO 
notation  may  be  used  to  indicate  repetition  of  the  operation  on  different 
sets  of  variables  and/or  floating  point  constants.   (Explanation  of  D0- 
notation  appears  in  Sec.  9). 

All  input  parameters  of  the  instruction  which  contain  variables  may  be 
replaced  by  floating  point  constants. 


3.   For  all  parameters  containing  variables,  DO-notation  ranges  may  be 
substituted  for  a  single  variable  occurance.  (DO-notation  ranges  are 
described  in  SEC.  9) 
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VI.   Transformations  Labels 

In  TRANSFORMATIONS  there  are  several  instructions  which  perform 
some  kind  of  test  on  your  data.   In  most  cases,  the  results  of  that 
test  cause  the  program  not  to  execute  the  next  sequential  instruction, 
but  to  branch  to  some  other  statement  in  the  TRANSFORMATIONS  program 
and  continue  executing  with  that  statement.   In  order  to  refer  to  these 
statements  we  wish  to  transfer  to,  TRANSFORMATION'S  labels  are 
employed . 

The  form  of  these  labels  can  be  illustrated  by  the  following 
example  which  skips  over  the  divide  statement  if  the  divisor  is  zero. 


IF (3 1 "NEXT"  "JUMP"  "NEXT", 
"NEXT"DIV  (2H3)(^. 
"JUMP"  next  statement 


or 

IF(3)  "*+l"  "*+2"  "*+l*. 
DIV  (2) (3) (4). 

next  statement 

The  preceding  equivalent  examples  exhibit  the  two  types  of  TRANSFORMATIONS 
labels.   The  syntactical  rules  which  govern  the  two  types  of  labels 
follow. 

Type  1 

A.  A  type  1  label  consists  of  eight  or  less  alphanumeric  characters 
set  off  by  a  pair  of  quotes. 

B.  Alphanumeric  characters  consist  of  the  alphabet  from  A  to  Z 
and  the  numberic  digits  from  0  to  9. 

C.  Any  unique  label  may  appear  as  an  operand  or  branch  address  of 
any  number  of  TRANSFORMATIONS  subparameter  instructions. 

D.  Any  label  which  appears  as  an  operand  or  branch  address  of  an 
instruction  must  appear  immediately  preceding  and  be  part  of 

at  least  one  and  only  one  TRANSFORMATIONS  subparameter  instruc- 
tion in  the  present  TRANSFORMATIONS  program. 

Type  2 

A.  A  type  2  label  consists  of  a  positional  reference  of  the 
form  *+n  set  off  by  a  pair  of  quotes. 

B.  The  symbol  *  is  pointing  to  the  statement  in  which  it  appears. 
Therefore  *+l  would  point  to  the  next  statement,  *+2  would  skip 
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one  statement,  and  *-l  wuld  point  to  the  preceding  statement. 

C.  This  type  of  label  need  only  appear  as  an  operand  or  branch 
address  and  not  before  the  TRANSFORMATIONS  subparameter 
instruction  which  it  is  referencing. 

D.  Since  the  statement  to  which  you  are  going  to  branch  is  always 
indicated  relative  to  the  statement  from  which  you  are  branching, 
it  is  possible  to  point  to  an  address  which  would  lie  beyond  the 
end  of  the  program  or  before  the  beginning  of  the  program. 
Needless  to  say,  this  would  result  in  an  error  condition. 

E.  Branching  to  "*+0"  would  create  an  infinite  loop  and  is  also 
illegal. 
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VII.  Subparameter  Description 
ABO 

The  ABORT  instruction  causes  immediate  termination  of  the  TRANSFORMATIONS 
program  and  the  entire  SOUPAC  program.   This  instruction  is  often 
transferred  to  when  internal  tests  reveal  incorrect  data.  Example: 

ABORT. 

ABS 

The  ABSOLUTE  VALUE  instruction  takes  the  absolute  value  of  the  first 
variable  and  stores  it  into  the  second  variable.  Example: 

ABS  (3) (25). 

ADD 

The  ADD  instruction  has  from  three  to  one  hundred  parameters 
pointing  to  variables.   The  first  variable  through  the  next  to  last 
variable  are  summed  and  the  result  is  stored  into  the  last  variable. 
Example  s : 

ADD  (6) (7) (23). 

ADD  (1) (3) (5) (7) (9) (100). 

ANG 

The  ANGLE  TO  RADIANS  instruction  converts  the  first  variable,  which 
should  be  a  measure  of  an  angle,  into  radians  and  stores  the  result 
into  the  second  variable.  Example: 

ANG  (3) (17). 

A-C 

The  ARCCOSINE  instruction  takes  the  arccosine  of  the  first  variable 
and  stores  it  into  the  second  variable.   The  first  vairable  must  be 
between  minus  one  and  one  inclusive.   The  result  will  be  stored  in  radians. 
Example : 


A-C  (7)(3l+). 


A-S 


The  ARCSINE  instruction  takes  the  arcsine  of  the  first  variable  and 
stores  it  into  the  second  variable.   The  first  variable  must  be 
between  minus  one  and  one  inclusive.   The  result  will  be  stored  in 
radians .   Example : 

A-S  (9)(13). 
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A-T 

The  ARCTANGENT  instruction  takes  the  arctangent  of  the  first 
variable  and  stores  it  into  the  second  variable.  Example: 

A-T  (8) (70). 

C-G 

The  COMPUTED  GO  TO  has  from  two  to  twenty  two  parameters.   The 
first  parameter  contains  a  variable  and  the  following  parameters  contain 
labels.   The  basic  form  is 

C-Gtv)"^"  "L2"  "L  "  "LnM.  where  n  <  21 

The  variable  must  be  floating  point.   If  it  is  not  of  integral  value 
then  it  is  truncated,  (all  digits  to  the  right  of  the  decimal  point  are 
dropped") .   The  instruction  will  then  branch  to  the  label  whose  position 
in  the  list  is  equal  to  the  integral  value  of  the  variable.  Example: 

C-G (7) "A"  "B"  "C"  "D"  "E". 

If  variable  7  is  equal  to  k.O   or  k.3   then  the  instruction  will  branch 
to  label  "D". 

CON 

The  CONSTANT  instruction  contains  from  two  to  one  hundred  parameters. 
The  instruction  is  used  to  assign  constant  values  to  variables.   The 
first  parameter  indicates  either  a  variable  or  range  of  variables. 
The  subsequent  parameters  contain  the  fixed  point  constant (s)  and/or 
the  floating  point  constant (s)  which  are  assigned  to  the  variable (s). 
Example  s : 

CON(7)(99)  assigns  fixed  point  value  99  to  variable  7. 
CON(8)*!+.3*  assigns  floating  point  value  ^.3  to  variable  8. 
CON(7,8)(99)*-U.3*  is  equivalent  to  the  previous  two  together. 

The  more  complicated  structures  which  contain  ranges  and  increments  for 
the  variables  and  constants,  uses  DO-notation  which  is  explained  in  Sec.  9. 
An  example  of  the  CONSTANT  instruction  with  that  structure  will  be  included 
in  that  section. 

COS 

The  COSINE  instruction  takes  the  cosine  of  the  first  variable  and 
stores  it  into  the  second.   The  first  variable  must  be  expressed  in 
radians.  Example: 


COS  (1)(17). 
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DIF 

The  DIFFERENCE  IF  instruction  contains  five  parameters.   The  first 
two  parameters  point  to  variables  and  the  next  three  contain  labels. 
The  second  variable  is  subtracted  from  the  first.   If  the  difference  is 
negative  the  instruction  branches  to  the  first  label,  if  the  difference 
is  zero  it  branches  to  the  second  or  middle  label,  and  if  the  difference 
is  positive  it  branches  to  the  third  or  last  label.  Example: 

DIF  (3)  (5)  "A"  "B"  "C". 


If  variable  3  minus  variable  5  is  negative  the  instruction  will  branch 
to  label  "A". 

DIV 

The  DIVIDE  instruction  contains  from  three  to  five  parameters.   In  the 
case  of  three  parameters,  the  first  variable  is  divided  by  the  second  variable 
and  the  result  is  stored  in  the  third  variable.   A  division  by  zero  will 
terminate  the  program.  -Example : 

DIV  (1^(2^(30^. 
In  the  case  of  four  parameters,  the  division  will  take  place  as  normal  except 
when  the  second  variable  is  zero.   In  that  event,  the  fourth  variable  will 
be  stored  into  the  third  variable  as  a  supplied  quotient.   Example : 

DIV  (1)  (2)  (30)  (3D. 

If  the  fifth  parameter  is  added,  it  indicates  a  label  to  be  branched  to 
in  the  event  of  a  division  by  zero.   The  branch  will  take  place  after 
the  supplied  quotient  is  stored  into  the  third  variable.  Example: 

DIV  (1) (2) (30) (3D "ZERO". 

EBC 

The  EBCDIC  instruction  converts  characters,  which  are  read  into  the 
program  by  means  of  an  Al  format  field,  into  floating  point  numbers. 
The  first  variable  contains  the  character.    The  result  after  the 
table  look  up  will  be  stored  into  the  second  variable.   Example: 

EBC  (6) (89^. 

The  table  used  for  the  conversions  is  located  following  the 
subparameter  descriptions.  This  instruction  is  often  used  to  prepare 
character  codes  for  input  to  a  SOUPAC  FREQUENCY  program. 

EXC 

The  EXCHANGE  instruction  exchanges  the  contents  of  two  variables. 
Example : 

EXC  (l)(2). 

EXI 

The  EXIT  instruction  causes  immediate  termination  of  the  TRANSFORMATIONS 
program,  but  will  continue  to  execute  the  SOUPAC  program  which  follows- 
Example : 
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EXP 

The  EXPONENT  BASE  e  instruction  raises  e  to  the  power  of  the  first 
variable  and  stores  the  result  in  the  second  variable.  Example: 

EXP  (31) (704). 

FAC 

The  FACTORIAL  instruction  calculates  the  factorial  of  the  first 
variable  and  the  result  is  stored  in  the  second  variable.  Example: 

FAC (87) (120). 
FIX 

The  FIXED  POINT  CONVERSION  instruction  converts  the  floating  point 
variable  indicated  by  the  first  parameter  and  stores  the  result  into 
the  second  variable.  Example: 

FIX  (2) (9). 

FLO 

The  FLOATING  POINT  CONVERSION  instruction  converts  the  fixed  point 
variable  indicated  by  the  first  parameter  and  stores  the  result  into 
the  second  variable.  Example: 

FLOAT  (7) (8). 

GO 

The  GO  TO  instruction  unconditionally  branches  to  the  label  indicated 
by  the  only  parameter.  Example: 

GO  TO  "LABEL". 

IF 

The  IF  statement  has  four  parameters.   The  first  parameter  contains 
a  variable  and  the  remaining  three  parameters  are  labels.   If  the 
variable  is  negative,  zero,  or  positive  the  instruction  will  branch  to 
the  first,  second,  or  third  label  respectively.  Example: 

IF  (33)  "L1""L2""L3". 
If  variable  33  is  positive  the  instruction  will  branch  to  label  "L3". 

INP 

By  use  of  the  INPUT  instruction  you  can  read  in  additional  rows  of 
data  in  addition  to  the  rows  from  the  main  input  matrix.   The  first  parameter 
indicates  which  sequential  unit  the  data  will  be  input  from.   The 
second  parameter  indicates  the  starting  variable  number  where  you 
wish  the  row  to  be  read  in.  Example: 

INPUT  (S3) (500). 
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If  S3  contains  6k   variables  then  this  input  instruction  will  read  the 
next  row  from  S3  and  store  it  in  variables  500  through  563.   Be  careful 
to  avoid  overwriting  of  existing  variables  which  you  need  and  also  of 
trying  to  read  more  or  less  rows  than  exist  on  a  particular  unit. 

LAS 

The  LAST  CARD  instruction  allows  instructions  to  be  performed  after 
the  last  row  of  data  has  been  read  in  and  processed.   The  LAST  instruction 
divides  a  program  into  regular  and  last  card  segments.   The  regular 
section,  as  is  a  TRANSFORMATIONS  program  without  the  LAST  option,  is 
executed  once  for  every  row  of  data.   After  all  the  main  input  data 
is  processed  the  last  card  segment  is  executed  once.   One  of  the  main 
uses  of  the  LAST  instruction  is  to  analyze  data  accumulated  in  variables 
1001-2000  during  the  regular  segment.   Example: 

LAST. 

Only  one  LAST  instruction  may  be  used  for  TRANSFORMATIONS  programs  and 
branching  between  regular  and  last  card  segments  is  prohibited. 

ELO 

The  LOG  BASE  e  instruction  takes  the  natural  log  of  the  first  variable 
and  stores  it  in  the  second  variable.  Example: 

EL0G  (3) (17). 
LOG 

The  LOG  BASE  10  instruction  takes  the  base  10  log  of  the  first 
variable  and  stores  it  into  the  second  variable.  Example: 

LOG  (18) (3k). 

MAX 

The  MAXIMUM  VALUE  instruction  has  from  three  to  one  hundred  parameters 
pointing  to  variables.   The  variable  with  the  largest  value  from  the 
first  variable  to  the  next  to  last  variable  is  stored  into  the  last 
variable .  Example  s : 


MAX  (1)(2)(7). 

MAX  (1)  (7)  (8)  (11)  (15! 


MIN 


The  MINIMUM  VALUE  instruction  has  from  three  to  one  hundred  parameters 
pointing  to  variables.   The  variable  with  the  smallest  value  from  the 
first  variable  to  the  next  to  last  variable  is  stored  into  the  last 
variable.   Examples: 

MIN  (11) (13) (29). 

MIN  (1)  (3)  (k)  (7)  (8)  (10)  (12). 
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MOD 

The  MODULAR  ARITHMETIC  instruction  finds  the  value  of  the  first 
variable  modulus  the  second  variable  and  stores  the  result  into  the 
third  variable.  Example: 

MOD  (1)(U)(7). 
MOV 

The  MOVE  instruction  stores  a  copy  of  the  first  variable  into  the 
second  variable.   If  a  value  already  exists  in  the  second  variable  it 
will  be  overwritten.  Example : 

MOVE  (3H9)- 

MUL 

The  MULTIPLY  instruction  has  from  three  to  one  hundred  parameters 
pointing  to  variables.   The  first  variable  through  the  next  to  last 
variable  are  multiplied  together  and  the  result  is  stored  in  the 
last  variable.  Examples: 

MUL  (l) (2^(20). 

MUL  (3) (k) (5) (6) (10). 

NO 

The  N00P  instruction  does  nothing.   Its  primary  use  is  when  it  is 
preceded  by  a  label  and  used  as  a  placeholder  in  the  TRANSFORMATIONS 
subprogram  to  which  many  different  instructions  branch.   It  is 
commonly  used  at  the  end  of  a  TRANSFORMATIONS  subprogram  where 
several  isolated  groups  of  instructions  all  'Wish  to  branch  to  the 
end.  Example : 

"END"  N00P. 


OUT 

The  OUTPUT  TO  UNIT  instruction  is  the  only  way  the  TRANSFORMATIONS 
program  can  output  a  row  of  data.   There  are  two  parameters.   The 
first  indicates  the  unit  to  which  the  row  of  variables  should  be  output, 
The  second  parameter  can  be  either  a  single  variable  or  a  range  of 
variables.  Examples: 

OUT  (S2^ (7). 

This  will  output  onto  S2  a  row  with  one  variable. 

OUT  (sU)(8,lU). 

This  will  output  onto  Sk   a  row  with  seven  variables,  variable  eight 
through  variable  fourteen.  All  outputs  to  the  same  unit  must  have  the 
same  number  of  variables  output.   A  more  detailed  description  of  types 
of  ranges  which  are  allowed  will  appear  in  Sec.  9  on  DO  notation- 
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PER 

The  PERMUTE  instruction  permutes  the  order  of  all  or  a  subset  of 
your  variables.   It  can  have  from  two  to  one  hundred  parameters,  all 
indicating  variable  numbers.   The  first  parameter  indicates  a  starting 
point  of  where  a  string  of  variables  should  be  placed.   The  rest  of  the 
parameters  compose  that  string  of  variables.   Each  of  the  parameters 
in  that  string  represent  either  a  single  variable  or  a  range  of  variables, 
Examples : 

PER  (lOO)(3)(lO,l^)(1+,5). 

This  example  places  variable  3  in  variable  100,  variables  10  through  ik 
in  variables  101  through  105,  and  variables  k   and  5  in  variables 
106  and  107.   A  more  detailed  description  of  types  of  ranges  which 
are  allowed  will  appear  in  Sec.  9   on  DO  notation  ranges. 

PER  (2)(1,1999). 

This  example  will  not  propagate  variable  1  through  all  the  variables. 
It  will  perform  the  intended  purpose  of  raising  all  variable  numbers 
up  one . 

RAD 

The  RADIANS  TO  ANGLE  instruction  converts  the  first  variable,  which 
should  be  a  measure  of  radians,  into  an  angle  and  stores  the  result  into 
the  second  variable.   Example: 


RAD  (9)(13) 


RAI 


The  RAISE  instruction  raises  the  first  variable  to  the  power  contained  in  ti 
second  variable  and  stores  the  result  into  the  third  variable. 
Example : 

RAISE  (1)(2)(7). 

EEC 

The  RECODE  instruction  recodes  variables  depending  on  the  satisfaction 
of  a  set  of  conditions.   The  sequence  of  parameters  depends  on  the 
number  of  conditions  that  must  be  met.   The  RECODE  instruction 
introduces  a  new  set  of  terminology  which  follows: 

A.   RELATIONAL  OPERATORS 


B. 


"LT" 

less  than 

"LE" 

less  than  or  equal  to 

"EQ" 

equal 

"NE" 

not  equal 

"GE" 

greater  than  or  equal 

"GT" 

greater  than 

CONNECTIVES 

"AND" 

"OR" 

"E0R" 
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C.  CONDITION  SET 

A  condition  set  consists  of  three  parameters,  the  first  and 
third  parameters  pointing  to  variables  and  the  second 
parameter  containing  a  relational  operator.   Any  condition  set  is 
either  true  or  false  depending  upon  whether  the  two  variables 
satisfy  the  conditions  of  the  relational  operator. 

D.  RECODE  SET 

A  recode  set  consists  of  two  parameters.   The  first  parameter 
indicates  a  variable.   The  second  parameter  indicates  a  variable 
or  a  floating  point  constant.   If  a  recode  set  is  executed 
the  value  of  the  second  variable  or  floating  point  constant 
is  stored  into  the  first  variable. 

The  RECODE  instruction  consists  of  from  one  to  twenty  one  condition 
sets  joined  together  in  the  case  of  more  than  one  by  connectives.   This  is 
followed  by  a  recode  set  to  be  executed  if  the  logical  product  of  the  condi- 
tion sets  is  true  and  optionally  a  second  recode  set  to  be  executed 
if  the  logical  product  is  false.   Examples: 

REC  (1|)  "EQ"  *3*  (M  *1*  (M  *0*. 

If  variable  k   equals  3.0  then  recode  it  to  1.0,  if  not  then  recode  it 
to  0.0. 

REC  (6)  "GE"  (10)  "AND"  (7)  "LE"  (20) (110)  (ill) 

If  variable  6  is  greater  than  or  equal  to  variable  10  and  variable  7  is 
less  than  or  equal  to  variable  20  then  recode  variable  110  to  the 
value  of  variable  111 . 

SIG 

The  SIGN  instruction  places  the  sign  of  the  first  variable  on 
the  second  variable.   It  is  often  used  for  saving  signs  of  variables 
during  intermediate  calculations.   Example: 

SIGN  (l)(90l). 

SIN 

The  SINE  instruction  takes  the  sine  of  the  first  variable  and  stores 
it  into  the  second  variable.   Example: 

SIN  (l)(ll). 
SKI 

The  SKIP  instruction  has  two  parameters.   The  first  parameter  is  a 
sequential  unit.   The  second  is  a  variable  or  floating  point  number. 
The  number  in  the  variable  or  the  floating  point  number  indicate  the 
number  of  rows  to  be  skipped  o n  the  specified  unit.   Example: 

SKIP  (S2)(7) 
SKIP  (S5)  *1*. 
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SQU 

The  SQUARE  ROOT  instruction  takes  the  square  root  of  the  first 
variable  and  stores  it  into  the  second  variable.   Example: 

SQU  (13)  (lM. 

SUB 

The  SUBTRACT  instruction  subtracts  the  second  variable  from  the  first 
variable  and  stores  the  result  into  the  third  variable.   Example; 

SUB  (7)(9)(10). 
XAD  -  fixed  point  addition 
XDI  -  fixed  point  divides 
XIF  -  fixed  point  if   ' 
XMU  -  fixed  point  multiply 
XSM  -  fixed  point  sum 
XSU  -  fixed  point  subtract 

The  preceding  fixed  point  instructions  have  the  same  parameters  as 
the  corresponding  floating  point  instructions.   The  variables  used 
in  the  fixed  point  instructions  must  be  the  fixed  point  representation 
rather  than  the  normal  floating  point  representation. 
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VIII.   EBCDIC  Conversion  Table 


Character 

Floating 

Point  Number 

Character 

Floating  Point  Number 

blank 

-0. 

* 

Ik. 

0 

0. 

. 

75. 

1 

1. 

< 

76. 

2 

2. 

( 

77- 

3 

3. 

+ 

78. 

k 

k. 

1 

79- 

5 

5- 

& 

80. 

6 

6. 

! 

90. 

7 

7- 

$ 

91. 

8 

8. 

* 

92. 

9 

9- 

) 

93. 

A 

10. 

» 

9k. 

B 

11. 

-T 

95- 

C 

12. 

96. 

D 

13. 

/ 

97- 

E 

Ik. 

5 

107- 

F 

15. 

J 

108. 

G 

16. 

109 

H 

17- 

> 

110. 

I 

18. 

? 

ill. 

J 

19. 

J 

122. 

K 

20. 

# 

123- 

L 

21. 

e 

12U. 

M 

22. 

i 

125- 

N 

23. 

= 

126. 

0 

2k. 

H 

127. 

P 

25. 

Q 

26. 

R 

27. 

S 

28. 

T 

29- 

U 

30. 

V 

31. 

w 

32. 

X 

33. 

Y 

3k. 

Z 

35. 
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IX.   DO-Notation 

DO-notation  is  a  facility  provided  in  the  TRANSFORMATION  program  to  enable 
a  user  to  easily  and  compactly  perform  an  operation  on  a  set  of  variables 
instead  of  performing  that  operation  on  each  of  the  variables  individually. 
This  concept  of  DO-notation  corresponds  to  the  concept  of  FORTRAN  DO-loops . 
The  following  form  of  DO-notation  would  be  used  in  a  parameter  which 
points  to  a  variable. 

(vls  V2,  I) 

V-,  =  the  initial  variable  of  the  set 

Vo  =  the  criterion  variable  for  termination  of  the  set. 

I  =  the  increment 

Examples : 

(l,  5,  l)  points  to  variables  1,  2,  3,  h,    and  5. 

(l,  5 »  2)  points  to  variables  1,  3,  and  5- 

(k,   lU,3)  points  to  variables  k,   7,  10,  and  13. 

If  the  increment  is  not  specified  it  is  assumed  to  be  1,   Example: 

(2,  10,  l)  is  equivalent  to  (2,  10). 

A.  The  major  use  of  DO-notation  is  to  indicate  repetition  of  an  instruction 
on  different  sets  of  variables.   Examples: 

ADD(l,5,2)(6,8)(l2,lM   is  equivalent  to  ADD(l) (6)  (12) . 

ADD(3)(7)(13). 
ADD(5)(8)(lU). 

MUL(1,10(100)(101,107,2)   is  equivalent  to  MUL(l) (100) (lOl) . 

MUL(2)(100)(103). 
MUL(3)(100)(105). 
MUL(M(100)(107). 

If,  in  an  instruction  containing  several  parameters  using  DO 
notation,  the  sets  of  variables  are  unequal  in  length,  the  instruction 
will  cycle  until  the  longest  set  of  variables  has  been  satisfied. 
The  variables  used  in  the  shorter  sets  after  they  have  been  exhausted 
will  be  the  last  variables  of  that  set.   Example: 

MAX(1,7,3)(11+,15)(13,15,2).  is  equivalent  to  MAX(l)  (lk)  (13) . 

MAX(U)(15)(15). 
MAX(7)(15)(15). 

B.  The  secondary  use  of  DO-notation,  called  DO-notation  ranges,  is  used 
with  the  constant,  output,  and  permute  instructions.   Instead  of 
indicating  a  repetition  of  the  instruction  on  each  variable  in  the 
set,  the  instruction  is  executed  once.   The  parameter  in  which  the 
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DO-notation  range  occurs  now  points  to  a  string  of  variables. 
Examples : 

0UTPUT(Sl)(2,T).     outputs  the  string  of  variables 

2, 3,1+, 5, 6  and  7- 

0UTPUT(S2) (3,7,2).   outputs  the  string  of  variables 

3,5,  and  7- 

PERMUTE  (l00)(3)(l0,15)(20,2l+,2). 

has  two  parameters  of  DO-notation 

ranges  in  one  instruction.   The  instruction 

places  into  variables  100  through  109,  the 

following  variables:  3,  10  ,11 ,12  ,13 ,lh ,15 , 

20,22,2i+ 

Looking  at  the  CONSTANT  instruction,  we  see  DO-notation  ranges  used 
to  indicate  strings  of  fixed  and  floating  point  constants  as  well 
as  strings  of  variables. 

The  first  parameter  of  CONSTANT  can  point  to  a  variable  or  a  string 
of  variables.   The  later  is  the  only  case  which  involves  DO-notation 
ranges,  so  the  discussion  will  be  confined  to  that  case. 

C0N(2,*0(l)(2)(3) •   places  the  fixed  point  constants  one,  two 

and  three,  into  the  string  of  variables 
2,3,U 

By  using  DO-notation  ranges  with  fixed  point  constants  the  equivalent 
instruction  would  be 

CON(2,l+)(l,3). 

The  next  two  examples  are  also  equivalent.   They  both  assign  to 
variables  five  through  eight  the  floating  point  constants 
2.5,5-0,7.5,  and  10.0. 

CON ( 5 , 8 ) *2 . 5**5 • 0**7 • 5**10 . 0* . 

CON(5,8)*2.5,10.0,2.5*. 

The  initial  values  of  the  string  of 
floating  point  constants  is  2.5-   The 
termination  criterion  is  10.0.   The 
increment  is  2.5 

All  three  types  of  DO-notation  ranges  can  be  used  together  in  the 
CONSTANT  instruction.  The  following  example  combines  both  of  the 
preceding  sets  of  examples  into  one  instruction. 

CON(2,8)(l,3)*2.5,10.0,2.5*. 

Note:   If,  when  using  DO-notation  ranges  with  the  CONSTANT  instruction, 
the  string  of  variables  is  unequal  in  length  to  the  total  number  of 
fixed  and/or  floating  point  constants  indicated,  the  string  of  constants 
is  truncated  if  it  is  longer  and  if  it  is  shorter  the  last  constant 
is  assigned  to  the  remainder  of  the  variables. 
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X.   Flag  Notation 


Flag  notation  is  TRANSFORMATION'S  version  of  indirect  addressing.   Instead 
of  a  parameter  pointing  directly  to  a  variable,  it  points  indirectly  to  a 
variable  through  another  variable.   The  parameter  points  to  a  variable  which 
in  turn  points  to  another  variable.   This  feature  enables  an  instruction  to 
point  to  different  sets  of  variables  depending  upon  the  values  assigned  to 
the  intermediate  variables. 

The  main  type  of  flag  notation  is  called  F-flag  notation.   F-flag  notation 
is  indicated  by  inserting  an  F  directly  after  the  variable  number.   Example: 

ADD(7F)(8F)(9F).     Restriction:   The  values  in  the 

intermediate  variables  of  flag-notation 
(variables  7,8,  and  9  in  this  example) 
must  be  in  fixed  point  representation  and 
must  point  to  a  valid  variable  number. 

Let  the  notation  Vn  indicate  variable  n.   Example:  V"7  indicates  variable  "J. 

If  V7  =  100 
V8  =  150 
V9  =  180 

then  the  preceding  example  would  generate  after  the  indirect  addressing 
takes  place : 

ADD(100)(150)(180). 

F-flag  notation  can  also  be  used  with  DO-notation.   Example: 

If  V100  =  2  V110  =  6  V120  =  10 
V101  =  13  Vlll  =  16  V122  =  19 
V102  =  5    V112  =  9    V121+  =  13 

then  MUL(l00F,102F)(ll0F,112F)(l20F,12l+F,2).  would  generate  after  indirect 
addressing. 

MUL(2)(6)(10). 

MUL(13)(16)(19). 

MUL(5)(9)(13). 

Note  that  the  limits  of  the  DO-notation  are  extracted  form  the  intermediate 
variables  and  not  from  the  final  variables. 

The  other  type  of  flag  notation  is  called  D-Flag  notation.   Its  only 
use  is  with  DO-notation.   The  difference  between  it  and  F-flag  notation  is 
that  the  limits  of  the  DO-notation  with  D-flag  notation  are  derived  from  the 
final  variables  after  indirect  addressing  takes  place.   Example: 

Using  the  same  variable  values  as  above. 

MUL(100D,102D)(110D,112D)(120D,12UD). 

would  generate  after  indirect  addressing: 

MUL(2,5)(6,9)(10,13). 

or 

MUL(2)(6)(10). 
MUL(3)(7)(H). 
MUL(U)(8)(12). 
MUL(5)(9)(13). 
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XI.   A  TRANSFORMATIONS  Example 

Let  us  consider  a  set  of  data  consisting  of  ten  variables.   Variable 
one  contains  a  zero  for  females  and  a  one  for  males.   Variables  two  through 
five  and  eight  through  ten  all  range  from  zero  to  ninety-nine  and  are  to 
be  collapsed  into  the  numbers  one  to  four  representing  quartiles.   Variables 
six  and  seven  are  to  be  recoded  into  a  dichotomous  variable  depending  if 
the  value  is  five  or  not  five.   The  output  is  then  split  into  males  and 
females  suitable  for  input  into  two  separate  FREQUENCY  programs.   Also 
desired  are  averages  from  variables  two  through  ten  before  recoding  takes 
place.   Totals  are  kept  during  the  regular  segment  and  then  in  the  last 
card  segment  these  are  divided  by  the  sample  size  and  printed. 


TRANSFORMATIONS 

INPUT  (C)(1). 

PERMUTE  (6)  (8,10) (6,7). 

ADD  ( 1001  )*1*( 1001). 

ADD  (1002  ,1010 ) (2 ,10 ) (1002 ,1010 ) . 

REC0DE  (2,8)"LT"*25*(2,8)*1*. 

REC0DE  (2,8)"GE"*75*(2,8)*1+*. 

REC0DE  (2,8)"GE"*50*(2,8)*3*(2,8)*2*. 

REC0DE  (9,10)"EQ"*5*(6,7)*1*(6,7)*0*. 

IF  (l) "BAD ""FEMALE" "MALE". 

"BAD "ABORT. 

"FEMALE"0UTPUT ( S2 ) ( 2 ,10 ) . 

GO  TO  "END". 

"MALE"0UTPUT ( S3 ) ( 2 , 10 ) . 

"END"N00P. 

LAST. 

DIVIDE  (1002, 1010)  (1001) (1002, 1010). 

OUTPUT  (P) (1001, 1010). 

END  PROGRAM 


inputs  one  row  from  cards 

reorders  the  variables 

increments  row  number  by  one 

adds  respective  values  to  row  totals 

recodes  first  quartile  values 

recodes  fourth  quartile  values 

recodes  third  and  second  quartile  values 

creates  dichotomous  variables 

branches  based  on  male  or  female 

aborts  program  due  to  bad  data 

outputs  female  data 

branches  around  male  output 

outputs  male  data 

this  instruction  is  only  a  placeholder 

indicates  beginning  of  last  card  segment 

calculates  averages 

prints  sample  size  and  averages 


The  regular  segment  is  executed  once  for  every  row  of  card  input  data. 
Here  the  values  are  recoded   and  output,  while  also  row  totals  are  kept 
in  variables  greater  than  1000.   The  last  card  segment  is  executed  once 
to  calculate  and  print  the  averages. 
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XII.   NOTES  and  IDEAS 


1.  Missing  data  of  the  form  -0.0  can  be  differentiated  from  0.0 
only  in  the  recode  statement,  so  if  this  distinction  must  be 
made,  -0.0  should  first  be  recoded  to  another  value  before  testing. 

2.  The  valid  outputs  in  the  OUTPUT  instruction  are  PRINT  and/or 
Sn,  n<15. 

3.  When  collapsing  data  be  careful  not  to  overlap  your  recoding 
and  inadvertently  recode  values  twice  or  more. 

h.      The  former  ZAP  and  SPRAY  instructions  have  been  incorporated 
into  the  CONSTANT  instruction  by  using  DO  notation  with 
floating  point  constants. 

5-  The  former  RECIPROCAL  instruction  can  now  be  performed  using 
a  floating  point  constant  of  *1*  in  the  DIVIDE  instruction. 

6.   The  ONE  RECODE,  TWO  RECODE,  OPEN  WHEN,  and  CLOSED  WHEN  instructions 
have  all  been  replaced  by  the  RECODE  instruction.   For  those 
not  familiar  with  the  new  terminology  in  the  RECODE  instruction 
more  examples  appear  below. 

Given:     VI  =10   V3  =  13   V5  =  23 
V2  =23   VU  =  89   V6  =  -T 

The  following  condition  sets  have  the  respective  truth  values. 

(1)  "GT"  (2)  is  false 
(3)  "LE"  (k)  is  true 

(2)  "EQ"  (5)  is  true 
(6)  "GE"  *0*  is  false 
(2)  "NE"  (5)  is  false 
(6)  "LT"  (It)  is  true 

If  two  or  more  condition  sets  are  joined  by  "AND",  they  must 
all  be  true  for  the  logical  product  to  be  true.   If  two  or  more 
condition  sets  are  joined  by  "OR",  then  the  logical  product  is  true 
if  any  of  the  condition  sets  are  true.   If  the  connectives  are  mixed, 
then  the  "AND"  connective  is  of  higher  precedence  than  the  other 
connectives  in  the  same  way  that  multiply  is  of  higher  precedence 
than  addition  in  ordinary  arithmetic. 


BASIC  POPULATION  STATISTICS  PACKAGE 


FREQUENCY  COUNTING  AND 
MEASURES  OF  ASSOCIATION 


General  Description 

This  program  computes  tables  of  the  frequency  of  occurrence  of 
values  that  input  variables  take,  and  where  appropriate,  measures  of 
association  may  be  computed.   Input  to  the  program  may  be  in  the  form  of 
previously  computed  tables  (on  which  measures  of  association  will  be 
computed)  or  may  be  in  the  form  of  raw  data.   Only  integer  numbers 
may  be  counted;  decimal  point  data  will  be  rounded.   Negative  values 
are  allowed. 

A.   FREQUENCY  COUNTING 

The  following  options  are  available : 

1.  Either  one-dimensional  or  two-dimensional  tables  may  be 
specified.   For  one-dimensional  counts,  the  frequency  of 
occurrence  for  each  value  of  the  variable  is  listed.  For 
two-dimensional  counts,  each  value  of  the  second  variable 
is  counted  separately  for  each  value  of  the  first  variable. 

2.  Control  variables  may  be  used  which  enable  counting  to  be  done  in 
up  to  12  dimensions.   If  control  variables  are  specified,  data 
must  be  presorted  on  these  variables.  When  the  value  of  any 
variable  designated  a  control  variable  changes  from  one  row 

to  the  next,  counting  is  stopped  and  a  new  table  is  started. 
Thus  counting  proceeds  as  long  as  the  values  of  all  control 
variables  remain  constant. 

3.  The  minimum  and  maximum  values  to  be  attained  may  be  specified 
separately  for  each  variable  to  be  counted  (or,  optionally, 
not  specified  at  all).   If  either  the  maximum  or  minimum  values 
are  not  specified,  they  will  be  determined  from  the  data  using 
an  extra  read  of  the  data.  Values  which  fall  below  the 
minimum  or  above  the  maximum  are  ignored.   This  capability  adds 
flexibility  to  the  program  and  may  be  an  appreciable  cost  saver. 
Its  misuse  by  gross  estimates  of  minimum  and  maximum  values 
can,  however,  be  costly. 

k.      For  each  cell  in  a  one-dimensional  table,  the  percentage,  if 
requested,  of  the  total  sample  that  were  counted  there  will  be 
printed.   For  two  dimensional  tables,  the  percentage  of  the  row 
and  column  may  also  be  requested. 

5.  A  weighting  variable  may  be  specified.  Without  a  weighting 

variable,  frequency  counts  are  advanced  by  one  for  each  occurrence 
of  a  value.  When  a  weighting  variable  is  used,  the  frequency 
counts  are  advanced  by  the  value  of  the  weighting  variable  for 
the  row.   Thus  some  rows  of  data  may  be  given  more  importance 
than  others. 
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6.  Labels  can  be  given  for  variables  so  that  output  is  more 
readable.  Each  label  is  restricted  to  eight  characters  or 
less. 

7.  Input  may  be  from  previously  computed  two-dimensional  tables, 
from  which  measures  of  association  can  be  directly  computed. 

B.   MEASURES  OF  ASSOCIATION 

The  following  coefficients  are  calculated  and  printed  on  option  for 
two -way  t  ab le  s : 

1.   Chi-square  and  related  coefficients 

Let:  n  =  total  population  of  the  table 

n   =  number  of  Vertical  classification  a  (column  a) 
and  Horizontal  classification  b  (row  b) 

na.  =  bnab 

n  ,  =  ^ab 
•  b   a 

oc  =   number  of  rows 
P  =  number  of  columns 

2 
LL   (  ab    a.   .b) 


Then:   chi-square  =  ,  n 

ab  - 


na.  n.b/n 

adjusted  chi-square  (Yate's  correction  for  continuity)  for  2x2 
tables  only  = 

2 
ZZ   /n  ,  -  n  n  -  1/2 . 
,  (  ab    a  b    '  ) 

ab  — - 

a  d 
n 

c        (      chi-square/n  \    ' 
1  +  chi-square/n 

/Chi-square/nv  / 

T  =  (  (a-lHp-lf) 

C  and  T  are  measures  of  contingency  and  can  be  looked  up  in  con- 
tingency tables.   The  maximum  expected  frequency  is  also  printed. 


FREQUENCY 
Page  3 


2.   Lambda  coefficients 


Let:   n   =  Max  n 

am    ,    ab 
b 

n  ,  =  Max  n 
mb        ab 
a 

n   =  Max  n 
.m    ,    .  b 
b 


n   =  Max 
m. 

a 

r 

i 
a. 

Lambda  = 

£ 
a 

Z 

nam  +  b  nmb  " 

n.m  - 

•  nm. 

2n  -  n 

.m 

-  n 

m. 

£ 

a 

nam  "  n.m 

Lambda  H 

£ 
b 

n  -  n 

m. 

nmb  "  %. 

Lambda  V 

= 

n  -  nm. 

Lambda  coefficients  will  be  indeterminate  if  all  values  lie  in 
one  column  or  row. 

Lambda  H  can  be  defined  as  the  decrease  in  probability  of  error 
in  predicting  the  H-variable  when  knowledge  of  the  value  of  the 
V- variable  is  considered  as  opposed  to  random  guessing  of  the 
H-variable. 

Ninety-five  per  cent  confidence  limits  are  calculated  and  printed 
for  Lambda  H  and  Lambda  V  using  the  methods  discussed  by  Goodman 
and  Kruskal  in  their  second  article.   (See  references).   Lambda 
is  always  between  Lambda  H  and  Lambda  V. 

Weighted  Lambda  Coefficients 

n  n  , 

v  am         ab 

£ Max  £ 

„  n  n 

Weighted  Lambda  H  =  -— ^ B — 0 — ^ 

Max  £     ab 
a  -    ,         — 
b     an 

a. 

n   r,  n    v, 

v,     mb        ,,        „     ab 
E Max  z  

b      .b  a     b      .b 


Weighted  Lambda  V  =  u —■ n—r 

B  Max  E  nab 


R  -   a  b  n  , 

K         .b 

These  are  Lambda  H  and  Lambda  V  calculated  using  weighted  quantities 

n  n 

l/cc  — —  and  1/3  — —  ,  respectively^  instead  of  n   . 
a.  .b 
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No  confidence  limits  are  provided  for  the  weighted  lambda  coef- 
ficients. 


k.      Gamma  Coefficient 


Zi 


Z  Z 


Let:      Ps  =   o>,  nab    [a'>a    b>b    nn.h,] 


ab 


ZE 


Z 
a' 


Z 
b' 


PD  =   ab  nab    Ca'>a    b'>b    na'b'^ 


Then  Gamma  = 


PS   -   PD 
PS   +  PD 


Ninety-five  per  cent  confidence  limits  for  Gamma  are  calculated 
using  the  method  outlined  and  preferred  in  the  second  article  by 
Goodman  and  Kruskal. 

References 

These  coefficients  are  discussed  and  compared  by  Leo  A.  Goodman  and 
William  H.  Kruskal  in  their  article  "Measures  of  Association  for 
Cross  Classification",  American  Statistical  Association  Journal, 
December,  195^- 

The  Gamma  coefficient  is  their  suggested  measure. 

The  C  coefficient  was  first  suggested  by  Karl  Pearson  and  the  T 
coefficient  is  due  to  Tchuprow. 

The  Lambda  coefficients  apparently  were  first  suggested  by  Louis 
Guttman  ("The  Predication  of  Personal  Adjustment",  Bulletin  h8, 
Social  Science  Research  Council,  New  York,  19^+1  "i  ■ 

The  development  of  the  approximate  sampling  theory  and  of  the 
machinery  for  calculating  the  confidence  intervals  for  Lambda  and 
Gamma  was  done  in  a  sequel  article  by  Goodman  and  Kruskal:  "Measures 
of  Association  for  Cross  Classification  III;  Approximate  Sampling 
Theory",  American  Statistical  Association  Journal,  June,  1963- 

The  statistics  that  are  requested  will  be  printed  immediately 
following  each  table. 


II. 


Restrictions 


The  program  is  limited  currently  to  U50  input  variables  and  1000 
tables.   Tables  are  restricted  to  a  maximum  of  80,000  cells, 
each  of  which  can  hold  a  maximum  count  of  32, 767.   As  many  tables  as 
will  fit  into  work  storage  (80,000)  will  be  computed  in  each  read  of 
the  data.   If  tables  will  fit  into  80,000  cells,  card  input  is  allowed. 
If  maxima  and  minima  are  not  specified  for  card  input,  data  will  be 
transferred  to  disk  during  preread  of  data. 
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III.   Parameters 


Main  Parameter  Card 

Immediately  following  the  program  name  FREQUENCY  (mnemonic:   FRE),  the 
following  parameters  are  listed,  each  enclosed  in  parentheses  with  a 
period  after  the  last  parameter  used: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address. 

2  0  -  ignore  blanks 

1  -  count  blanks  separately 

2  -  count  blanks  as  zeroes 

_  0  -  normal  spacing 

3  Spacing  .        ,  ,  / 

1  -  one  table  per  page 

h  Address  of  labels. 

5  Variable  number  of  weight  variable. 

r  „  .      0  -  raw  data 

6  Type  of  input  . 

n  -  where  n  is  the  number  of 

previously  computed  tables. 
If  n  >  1,  then  input  must  be 
from  cards,  and  each  table  is 
a  separate  data  deck. 

If  both  parameters  1  and  h   are  cards,  the  labels  must  precede  data. 

Subparameters 

Subparameters  follow  the  main  parameter  card  and  can  be  in  any  order. 
A  period  must  follow  each  subparameter  statement  though  the  statement 
can  be  continued  on  more  than  one  card.   If  the  subparameter  statement 
is  left  out,  the  option  is  not  used.   In  the  following  explanation 
I  =  integer  and  F  =  real  number. 

Mnemonic  Use  or  Meaning 

PER(l) (i) (i) .  Per  cents  are  requested   . 

1  -  yes 


ls"k  integer  =  total  per  cent 
2n   integer  =  row  per  cent 
3   integer  =  column  per  cent 


MIN*F**F* Minimum  and  maximum  are  given.   The 

MAX*F**F* last  value  is  propagated  to  any 

remaining  variables.   Data  will  be 
reread  if  either  MIN  or  MAX  is  missing. 
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Mnemonic 

MEA(I)(I)(I)(I). 

Only  applicable  to 
2-way  tables 


Use  or  Meaning 


CONTROL  (I  Hi11 

ONE(l,I,lHl,I.l) 
TW0(I,I,I)(I,I,I) 


0  -  no 

1  -  yes 

1S^  integer  =  X   (with  a  code  of 


Measures  are  requested 
r2 


2  both  X^  and  a  table  of  expected 
frequencies  will  be  printed) 


2nd  integer  =  \  (lambda") 
3r  integer  =  weighted  X 
h       integer  =  j    ( gamma ^ 

Up  to  10  control  variables  are  allowed. 
The  I ' s  should  be  the  variable  numbers 
of  the  control  variables. 

One  and  only  one  of  these  two  must  be 
....       in  every  program.   ONE  means  one-way 
tables.   TWO  means  two-way  tables. 
In  ONE,  (l,I,l)  specifies  one  range  of 
tables.   In  TWO,  (l,I,l) (l,I>l)  specifies 
one  range  of  2-dimensional  tables. 

The  notation  (1,1,1s!  has  the  following  meaning:   If  it  is  absent  com- 
pletely, i.e.,  ONE.  or  TWO.  then  all  possible  tables  are  calculated. 
The  first  integer  is  the  initial  value,  the  second  is  the  terminal 
value  and  the  third  is  the  increment.   It  means:   take  all  values 
starting  at  the  first  integer  and  stepping  by  the  third  integer  until 
you  reach  the  second  integer.   If  the  third  integer  is  missing,  the 
increment  is  taken  to  be  one.   If  the  second  is  also  missing,  then 
the  first  is  taken  as  a  single  table  specification.   As  many  as 
wanted  can  be  specified  subject  to  the  following  restrictions:   In 
the  two-way  tables,  no  more  than  5^0  separate  ranges,  i.e.,  (I,  I,  lHl>I>  l) 
can  be  specified. 


IV.   Labels 

Labels  can  come  from  cards  or  temporary  storage.   Each  label  should 
be  treated  as  if  it  were  two  variables  each  h   characters  long.   For 
example,  if  there  are  6  variables  in  the  input  data,  then  there  would  be 
twelve  variables  for  labels  and  the  data  card  would  be    DATA(l2) (12AU) . 

All  the  labels  are  treated  as  one  row  of  input  n  variables  long. 
Labels  need  not  be  given  for  each  variable  but  if  a  variable  is  skipped 
and  more  labels  follow,  then  it  should  be  replaced  with  eight  blanks. 


Examples 

1.   FRE(C). 
PER(l). 

C0NTR0L(1)(3) • 
0NE(2,6,2)(9). 
END  P 

Input  is  from  cards;  per  cent  of  totals  will  be  printed;  control  variables 
are  1  and  3;  resulting  tables  are  2,  k,    6,    and  9.   Blanks  will  be  ignored. 
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2.   FRE(C) (2) (1) (C)(2). 
PER(l) (1) (1) . 

TW0(3)(1+)  (2,6,2)  (1,5,2). 

END  P 

All  per  cents  will  be  given;  2  will  be  the  weighting  variable;  resulting 
tables  will  be  3  vs  h;    2  vs  1;  2  vs  3 j  2  vs  5 >   ^  vs  1 ;  U  vs  3 ;  h   vs  5 ;  6  vs  1; 
6  vs  3;  6  vs  5.   Tables  will  be  printed  one  per  page  and  blanks  will  be 
counted  as  zeroes. 

Since  both  labels  and  data  are  on  cards  the  deck  will  look  like  this: 
FRE(C) 


END  S 

DATA(n)(nAU) 

label  for  first  labeled  variable. 

ENB# 

MTA(n/2)( ) 


.label  for  last  labeled  variable 


END# 

3-   FRE(S1)(1). 
TWO. 
MEA(2)(1)(1)(1). 

END  P 

All  possible  two-way  tables  will  be  calculated;  all  four  measures  will  be 
calculated  and  the  table  of  expected  frequencies  will  be  printed.  Blanks 
will  be  counted  separately. 

k.      FRE(C)()()()0(2).  ' 
MEA(l)()(l). 

END  P 

o 
Input  is  m  the  form  of  two  previously  computed  tables.  X  and  weighted  X 

will  be  calculated. 

Since  there  are  two  tables  the  deck  will  look  like  this: 
FRE(C) 

END  S 

DATA 

END# 

DATA 


END# 
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VI.   Output  Examples 

A.   ONE-DIMENSIONAL  TABLE 

A  one-dimensional  frequency  table  might  "be  output  as  follows 


VALUE 
FREQ 


1 
3 


k 
25 


5 

30 


9 
2 


TOTAL 

6o 


This  table  indicates  that  the  value  1  occurred  3  times,  that  the 
value  k   occurred  25  times,  and  so  on,  for  a  total  of  60. 

B.   TWO-DIMENSIONAL  TABLE 

A  two-dimensional  frequency  table  might  look  like  this: 


VARIABLE 

1 

ACROSS 

VARIABLE 

2 

DOWN 

VALUE 

1 

3 

5 

2 

1 

It 

3 

5 

2 

8 

1 

7 

1 

2 

1 

SAMPLE  SIZE  =  23 

This  table  would  indicate  that  simultaneous  observations  of  1  for 
variable  1  and  2  for  variable  2  occurred  once.   A  value  of  3  for 
variable  1  at  the  same  time  as  a  value  of  5  for  variable  2  occurs 
8  times.   The  number  of  observations  in  the  sample  was  23. 
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I .   General  Description 

A.  Purpose 

The  RANK  ORDERING  program  receives  as  input  raw  data  matrix  and 
produces  as  output  a  matrix  in  which  each  element  has  been  replaced 
by  a  number  denoting  the  rank  of  the  element  WITHIN  ITS  COLUMN.   In 
other  words,  each  column  of  the  input  matrix  is  considered  a  separate 
variate  and  will  be  converted  to  a  corresponding  ranking. 

The  smallest  variate-value  is  assigned  rank  1.0,  the  next  largest 
a  rank  2.0,  etc.,  until  the  largest  variate-value  is  assigned  the 
highest  rank.   In  the  case  of  tied  values,  identical  ranks  are  assigned 
to  equal  values,  the  rank-number  being  set  equal  to  the  average  of  the 
rank  which  would  occur  if  the  tied  values  were  distinguishable.   This 
is  sometimes  known  as  "mid-rank  method". 

B.  References 

Kendall,  Maurice  G.,  Rank  Correlation  Methods,  Charles  Griffin  and 
Co.,  Ltd.   London,  I9U8. 

II.   Restrictions 

A .  Input 

The  input  data  to  this  program  may  come  from  any  source.   If  cards 
are  used  as  input,  the  number  of  rows  in  the  input  matrix  must  be 
specified  on  the  data  format  card  and  the  total  number  of  elements 
in  the  matrix  may  not  exceed  30,000.   The  maximum  number  of  rows 
for  any  matrix  input  to  this  program  is  30,000,  and  the  maximum 
number  of  columns  for  any  matrix  input  to  this  program  is  U50. 

B .  Output 

If  an  input  matrix  contains  more  than  30,000  elements,  an  automatic 
partitioning  of  the  input  data  occurs  such  that  each  partition  contains 
the  maximum  number  of  complete  columns  possible  within  the  constraint 
that  no  one  partition  may  contain  more  than  30,000  elements. 

The  results  of  the  ranking  of  each  partition  are  output  separately, 
one  partition  per  output  address  specified  as  a  parameter  on  the  pro- 
gram parameter  card.   A  maximum  of  twenty- one  such  output  address  are 
allowed . 

CAUTION :   If  partitioning  is  anticipated,  the  user  should  specify  one  output 
address  for  each  partition  anticipated.   This  warning  applies  especially  in 
the  case  where  printed  or  punched  output  occurs.   Printing  and  punching  will 
occur  only  for  the  partitions  for  which  it  is  specified. 
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The  exception  is  for  partitions  over  the  twenty-first  one.  For  partitions 
beyond  the  twenty-first,  printing  and  punching  is  done  if  it  was  specified 
for  the  twenty-first  partition.  However,  no  partitions  beyond  the  twenty- 
first  one  may  be  stored  on  a  peripheral  device  (SEQUENTIAL  address). 

C.   Data 

Since  all  comparisons  in  this  program  are  done  in  single  word  length 
operands,  in  some  cases  the  program  may  not  be  able  to  successfully 
differentiate  between  two  values  which  agree  through  the  first  five 
significant  digits  and  differ  in  subsequent  digits. 


III.   Parameters 

The  parameters  for  the  RANK  ORDERING  program  must  follow  the  program 
name  on  the  program  call  card  in  the  order  given  below: 

Parameter 
Number  Use  or  Meaning 

1  Input  Addre  s  s . 

2-23  Output  Address. 

IV.   Special  Comments 

If  RANK  ORDER  correlation  coefficient  P  (Spearman's  rho)  is  desired, 
the  rankings  should  be  input  to  the  CORRELATION  program  (see  individual 
program  description)  and  the  Product  Moment  Correlation  coefficient 
obtained. 

CAUTION:   Input  to  the  CORRELATION  program  is  restricted  to  175  columns 
(variables) . 
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General  Description 

This  program  is  used  to  calculate  the  following: 
_    N 


Mean: 


i=l 


1J 


N 


Standard  Deviation: 


Variance : 


V. 


Standardized  Scores: 


S.= 

-    N                    N 
N  E   XT.    -    (2X.  .) 
i=l   ^      i=l    ^ 

J 

2                    , 

L      '        J 

z. . 

ID 


X.  . 
10 


X, 


1/2 


IX 


Moving  Averages:   XJ 


i=l 


1J 


where  b=number  of  periods 


II.   Restrictions 

A.  The  maximum  number  of  variables  is  ^50. 

B.  Means,  standard  deviations,  and  W's  may  be  calculated  using  as  many 
as  30  control  variables.  Data  must  be  presorted  (for  instance  with 
SORT-MERGE  or  on  a  card  sorting  machine)  on  the  control  variables. 

C.  In  obtaining  moving  averages  where  nvar  =  number  of  variables 
and  b  =  length  of  period,  nvar*b  is  fixed  for  any  core  size. 

If  a  design  will  not  fit,  a  message  will  be  printed  giving  proper 
increment  for  Region.  At  200K  approximately  12000  cells  (nvar*b) 
are  available . 

D.  Moving  averages  are  exclusive  of  all  other  options. 

E.  Output  is  of  four  categories: 

1.  With  or  Without  Control  Breaks 

a.  Sample  size,  mean,  standard  deviation  and  variance. 

b.  Moving  averages:   sample  size,  mean,  standard 
deviation  and  variance. 


Without  Control  Breaks 

a.   Standard  scores,  printed  output  includes  sample  size, 
mean,  standard  deviation  and  variance.   Missing  data 
option  for  mean,  etc.,  is  possible,  but  user  is  cautioned 
against  its  use.   Means,  etc.,  may  be  output  to  a 
temporary  storage  location. 
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Standard  scores  about  a  given  mean  and  standard  deviation. 
Printed  output  includes  sample  size,  mean,  standard 
deviation  and  variance.   Missing  data  option  for  mean, 
etc.,  is  possible.   Output  may  consist  of  standard 
scores  and  about  a  given  mean  and  standard  deviation. 
Output  may  be  print  and/or  two  different  temporary 
storage  locations.   Means,  etc.,  may  be  output  to  a 
temporary  storage  location. 


III.   Parameters 


The  parameters  appear  on  the  program  call  card  following  the 
program  name  STANDARD  SCORES  in  this  order: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address.   SEQUENTIAL  1-15 .   Cards  if  only 

-  means,  standard  deviations,  and  variance  desired, 
or  if  precalculated  means  and  standard  deviation 
are  supplied   (see  parameter  10). 

2  Output  Address  of  Standardized  Scores. 

3  Output  Address  for  Mean,  Sample  Size, 
Standard  Deviation,  and  Variance .   Sample 
Size,  Mean,  Standard  Deviation,  and  Var- 
iance can  be  put  out  on  a  temporary  unit . 
Output  is  in  the  form  of  four  column 

vectors  (N,  X,  S.,  V.). 
J    J 

h  If  1,  use  N-l,  if  0,  use  N  for  denominator 

of  standard  deviations.   N-l  gives  an  unbiased 
estimate  of  the  population  standard  deviation 
and  population  variance.   N  gives  the  sample 
standard  deviation  and  sample  variance . 

5  Output  Address  for  Standard  Scores  about  a 
specified  Mean  and  Standard  Deviation. 
SEQUENTIAL  1-13  and/or  PRINT. 

6  If  parameter  5  is  being  used,  place  desired 
Mean  between  asterisks,  for  example,  *50*. 

7  If  parameter  5  is  being  used,  place  desired 
Standard  Deviation  between  asterisks,  for 
example,  *5*- 

8  Moving  Averages:   Put  the  number  of  periods 

(observations)  over  which  it  is  desired  that 
the  data  be  averaged  (i.e.  b)  .   If  control 
variables  are  being  used  and/or  the  actual 
number  of  observations  is  less  than  stated, 
the  data  will  be  averaged  using  the  actual 
number  of  observations. 
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Parameter 
Number  Use  or  Meaning 


9  If  set  equal  to  1,  blanks  coded  as  -0.0  will 

be  checked  for.  s=   _     ,  Tr 

X.,  S.,  and  V.  will  reflect 

reduced  N.        3        3  3 

10  Input  Address  of  means  and  standard  deviations. 

1st  row  contains  means,  2nd  row  contains  standard 
deviations.  All  additional  rows  input  are  ignored, 
Valid  only  for  standard  scores. 

If  using  controls,  on  a  separate  card  immediately  after  the  STANDARD 
SCORES  card,  list  variable  numbers  of  those  variables  used  as  controls. 
For  example,  if  controlling  on  variables  1,  2,  and  k: 

STA(T2)(P) . 
$C-B(1)(2)(1+). 

NOTE:   If  there  is  only  1  observation  and  parameter  k   is  set  to  1,  then  the 
mean,  standard  deviation  and  variance  for  that  variable  will  be  set  to  zero. 

If  blanks  are  checked  and  standard  scores  are  requested,  those  observations 
which  have  a  blank  will  remain  blank  after  calculation  of  standard  scores. 

Example   (Use  of  Parameter  10 ) 

//  EXEC  SOUP 
//SYS IN  DD  * 
MAT. 

M0V(C)(S1). 
END  P 

C0R(S1)(S2). 
MAT. 

TEA(S2)(S3). 
END  P 

STA(S1)(P)()()()()()0()(S3). 
END  S 

DATA(lO) (10F1.0) 


END# 
/* 
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This  version  of  BALANOVA  5  is  unchanged  as  far  as  computations  are 
concerned  from  the  original  program  written  by  Paul  Herzberg  in  August, 
1966.   See  his  write-up  for  explanation  and  references  for  the  calculations, 

Each  observation  (row  of  data)  input  to  this  program  must  be  identi- 
fied by  a  number  for  each  factor  including  the  replication  factor.   These 
numbers  (which  cannot  be  punched  in  I  format)  represent  the  levels  of  the 
corresponding  factors  and  must  precede  the  dependent  variables.   In  the 
output  produced  by  the  program,  each  factor  is  given  a  unique  letter  name 
beginning  with  A.   Thus  the  first  column  of  the  input  data  corresponds  to' 
the  levels  of  factor  A  which  is  described  on  the  first  factor  specification 
card  (see  below).   Each  additional  factor  is  given  the  next  letter  in  til 
alphabet,  and  a  corresponding  factor  specification  card.   The  dependent 

onTttoL I ^Vf/^r  l6VelS  °n  the  lnPUt  ^  and  th^  are  numbered 
program!  *"  °f  dependent  variables,  in  the  output  of  the 

name  ^AL^NOvTT "^w^*?'  ***  followlnS  Parameters  follow  the  program 
name,  BALANOVA  5;  with  the  first  four  parameters  being  required. 


Parameter 
Number  Use  or  Meaning 


1 
2 


Input  Address.   SEQUENTIAL  1-15  OR  CARDS. 

Number  of  factors  counting  replication 
factor  if  there  is  one.   Maximum  =  10. 

3  Number  of  dependent  variables. 

Maximum  =  200. 

k  Number  of  levels  of  the  1st  factor. 


J    1-3  lMirm'hov    <->-P    1  Q,r»i  ^     „.p    j--u  _     ^,__  i  -,  „th 

11+ 


Number  of  levels  of  the  2nd  -  10th  factors, 


1  if  desire  unweighted  means  analysis  even  though 
have  proportional  cell  frequencies. 
p  1  to  suppress  printing  of  cell  means, 

fs  „„+•"?  the   Pr°eram  card  is  a  separate  subparameter  card  (factor  speci- 

the  innTtT^   I   T*  f&Ct°T   ln  the  0rder  in  Which  the  Actors  appear  in 
the  input  data.   Each  card  has  the  following  parameters. 


Parameter 


Number  Use  or  Meaning 


0  if  fixed  factor 

1  if  random  factor 

0  if  not  the  replication  factor 

1  if  is  the  replication  factor 
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Parameter 
Number  Use  or  Meaning 

3-11  factors  in  which  this  factor  is  nested 

As  in  other  SOUPAC  programs,  parameters  at  the  end  of  the  card  which  are 
not  used  may  be  deleted  and  the  period  appear  after  the  last  non-zero  para- 
meter.  The  factor  specification  cards  must  be  followed  by  an  END  PROGRAM 
card. 
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Chapter  1.   General  Description 

1.1   Introduction 

BALANOVA  5  is  a  general  analysis  of  variance  program  applicable  to  a 
wide-range  of  balanced  designs.   In  the  case  of  designs  with  a  replication 
factor,  BALANOVA  5  allows  inequality  in  the  number  of  replications  in  each 
cell.   If  the  number  of  replications  is  equal  or  proportional,  the  analysis 
is  handled  by  least  squares  (weighted  means).   If  the  number  of  replications 
is  not  proportional  then  an  unweighted  means  analysis  is  performed.   This 
is  an  approximation  to  the  least  squares  solution. 

BALANOVA  5  accepts  some  designs  that  are  not  completely  crossed,  namely 
those  nested  designs  in  which  all  main  factors  are  balanced.  Hence  hierarchical 
designs  are  allowed.   As  well,  repeated  measures  designs  are  allowed.   In  these 
designs  the  replication  factor  is  not  nested  in  all  the  other  factors. 

The  design  model  may  be  fixed-effects,  random- effects  or  mixed.   BALANOVA  5 
automatically  determines  all  the  legal  sources  of  variation  (main  effects  and 
interactions^  and  determines  the  correct  denominator  mean  square  for  those 
sources  which  can  be  tested  by  F  test.   In  order  to  do  this,  BALANOVA  5  first 
generates  the  expected  mean  square  table  which  is  printed  in  readable  form. 
The  method  used  closely  follows  Scheffe  (1959"),  Chapter  8. 

BALANOVA  5  will  accept  most  of  the  designs  described  in  Winer  (1962), 
Chapters  3,  h,    5,  6,  and  7  and  Lindquist  (1953),  Chapters  3,    5,    6,  6,  8,  9, 
10,  and  13  (Types  I,  III,  V1 1.   Chapter  2  and  the  Appendix  of  this  manual 
contain  a  large  number  of  examples  drawn  from  these  two  books. 

The  author  was  somewhat  reluctant  to  develop  BALANOVA  5  and  is  hesitant 
to  encourage  its  wide  use  for  the  following  reasons: 

1.  A  general  program  such  as  BALANOVA  5  encourages  the  use 
of  statistics  in  a  "cook-book"  manner.  Data  is  generated 
to  fit  the  input  specifications  of  the  program  with  no 
consideration  given  to  the  theory  of  analysis  of  variance. 
The  experimenter  who  uses  a  computer  program  in  this  way 
often  neglects  to  consider  whether  the  statistical  test 

is  appropriate  for  the  work  he  is  interested  in  and  whether 
the  assumptions  needed  for  the  test  are  satisfied  in  the 
particular  experiment  he  has  used. 

2.  In  the  author's  experience,  the  results  printed  by  such  a 
program  as  BALANOVA  5  have  a  certain  finality  which  encourages 
the  user  to  accept  the  results  as  gospel  truth.  When  used 

in  this  way,  the  experimenter  forgets  that  there  is  a  real 
possibility  of  programming  or  machine  error. 

3-  In  the  particular  case  of  analysis  of  variance,  the  idea 
has  become  widespread  that  the  summary  table  of  F  ratios 
is  the  most  important  part  of  the  analysis.  This  is  not 
the  case.   The  most  important  part  of  analysis  of  variance 
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is  the  estimation  of  the  main  effects  and  the 
interactions.   Only  by  looking  at  their  size  can 
the  experimenter  evaluate  what  is  happening  in 
his  experiment.   In  order  to  encourage  this  use 
of  analysis  of  variance,  BALANOVA  5  prints  a  table 
of  marginal  means  which  allows  easy  calculation  of 
all  the  effects  in  the  experiment.   The  F  table  is 
only  a  set  of  warning  signals.   A  non- significant 
F  indicates  that  the  corresponding  differences 
between  effects  can  be  attributed  to  sheer  chance. 

k.      BALANOVA  5  performs  an  unweighted-means  analysis  when 
the  replication  numbers  are  non-proportional.   The 
author  fears  that  this  option  will  be  used  too  often  and 
without  consideration  of  its  dangers.   The  unweighted- 
means  solution  is  often  not  satisfactory  and  references 
on  analysis  of  variance  should  be  consulted.   (Scheffe, 
1959,  Winer,  1962,  Lindquist,  1953). 

BALANOVA  5  was  designed  to  reduce  the  great  amount  of  hand  computation 
needed  in  analysis  of  variance  calculations.   It  was  not  intended  to  eliminate 
the  necessity  of  the  user  being  familiar  with  the  theory  of  analysis  of  vari- 
ance.  It  is  hoped  that  the  above  comments  will  discourage  some  indiscriminant 
use  of  BALANOVA  5- 

1.2   Special  Features 

The  output  from  BALANOVA  5  consists  of 

1.  A  table  of  the  expected  mean  squares  in  readable 
form. 

2.  The  number  of  replications  in  each  cell  in  the 
case  of  designs  with  a  replication  factor. 

3-   The  table  of  marginal  means.   All  means  entering 

in  the  computation  of  the  sum  of  squares  are  printed. 

h.      The  analysis  of  variance  summary  table  including, 
for  each  source  of  variation,  the  sum  of  squares 
and  mean  square,  and  for  each  source  with  denominator, 
the  F  ratio  and  the  probability  of  the  chance  occurance 
of  the  F  ratio. 

A  feature  of  BALANOVA  5  is  its  flexible  specification  of  analysis  of 
variance  designs,  allowing  a  wide  range  of  designs  to  be  described  by  a 
common  code . 

A  large  number  of  checks  are  made  by  BALANOVA  5  to  ensure  that  the  design 
is  legal  and  that  the  data  correspond  to  the  design.   Diagnostics  are  printed 
to  indicate  all  error  conditions. 
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1-3  Legal  Design  of  BALANOVA  5 

Consider  the  following  definitions,  taken  from  Scheffe  (1959) •   Let 
there  be  p  factors  in  a  design,  not  counting  the  replication  factor,  if 
there  is  one.   A  cell  is  specified  by  a  set  of  p  levels,  one  for  each 
factor.   The  layout  of  design  is  complete  if  there  is  at  least  one  ob- 
servation in  every  cell.   The  factors  in  such  a  design  are  completely 
crossed.   If  the  design  is  complete  and  there  is  a  replication  factor 
(i.e.  all  cells  have  at  least  one  observation  and  at  least  one  cell  has 
more  than  one  observation^  then  the  design  is  considered  to  be  a  Class  A 
design  in  BALANOVA  5- 

There  are  many  analysis  of  variance  designs  which  are  not  complete 
in  the  above  sense.   Examples  of  incomplete  designs  are  Latin-square, 
incomplete  blocks  and  nested  designs.   The  only  incomplete  designs 
which  are  allowed  in  BALANOVA  5  are  nested  designs  which  are  balanced  in 
all  factors  except  for  the  replication  factor  (which  need  not  be  balanced). 
These  incomplete  designs  are  called  Class  B  and  C  designs.   "Nesting" 
"balanced"  and  "replication  factor"  are  defined  in  the  nest  three  para- 
graphs.  These  definitions  are  illustrated  in  Chapter  2. 

Nesting  may  be  defined  as  follows:  The  levels  of  a  factor  C  are 
nested  within  the  levels  of  a  factor  A  (in  short,  C  is  nested  within  A) 
if  and  only  if  each  level  of  C  appears  with  only  a  single  level  of  A  in 
the  observations.  Note  that  if  C  is  not  nested  within  A,  it  is  crossed 
with  A,  but  only  if  every  level  of  C  appears  with  every  level  of  A  is 
C  completely  crossed  with  A.  Latin-square  and  incomplete  block  designs 
are  only  partly  crossed. 

A  nested  factor  C  is  balanced  if  the  number  of  levels  of  C  is  the  same 
within  each  combination  of  those  factors  within  which  C  is  nested  and  the 
factors  (if  any)  which  are  crossed  with  C  are  completely  crossed. 

A  replication  factor,  in  BALANOVA  5,    is  a  factor  which  is  nested  within 
one  or  more  other  factors,  but  not  necessarily  within  all  other  factors. 
Furthermore,  no  factor  may  be  nested  within  the  replication  factor.   That  is, 
a  factor  is  a  replication  factor  if  and  only  if  for  every  other  factor  A  in 
the  design,  it  is  either  nested  within  A  or  crossed  with  A.   A  replication 
factor  may  be  nested  within  some  factors  and  crossed  with  others.   There  can 
be  at  most  one  replication  factor  in  a  design. 

The  distinction  is  made  between  replication  factors  and  other  nested 
factors  in  BALANOVA  5  since  replication  factors  do  not  have  to  be  balanced. 
All  other  factors  must  be  balanced. 

Using  these  definitions,  the  following  designs  are  legal  in  BALANOVA  5- 

Class  A  designs  (completely  crossed  with  nested  replications) 

Class  A  designs  contain  (p  +  1)  factors  of  which  p  are  the  main  factors 
and  the  other  factor  is  the  replication  factor.   The  following  two  conditions 
must  both  be  met  for  the  design  to  be  Class  A. 
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(a)  All  p  main  factors  are  completely  crossed. 

(b)  The  replication  factor  is  nested  in  all  main  factors. 
Thus  one-way  and  factorial  designs  are  Class  A  designs. 

Class  E  designs  (other  replication  designs) 

Class  B  designs  also  contain  (p  + 1)  factors  of  which  p  are  the  main 
factors  and  the  other  factor  is  the  replication  factor.   However  one  or 
both  of  the  two  conditions,  (a)  and  (b),  are  not  satisfied  in  Class  B 

designs. 

When  (a)  is  not  satisfied,  that  is,  the  p  main  factors  are  not 
completely  crossed,  then  the  main  factors  must  satisfy  the  following 
condition. 

(a')   Consider  any  two  main  factors,  A  and  B.   Either  A  is  completely 
crossed  with  B,  or  A  is  nested  within  B  or  B  is  nested  within 
A.   This  must  be  true  for  all  pairs  of  main  factors.   Further- 
more, at  least  one  pair  must  have  the  nested  relationship  or 
else  (al  would  be  satisfied. 

When  (b)  is  not  satisfied,  then  the  following  condition  must  be  true. 

(b')   The  replication  factor  is  nested  in  at  least  one  but  not  all 
main  factors.   Note  that  the  requirement  that  the  replication 
factor  be  nested  in  at  least  one  factor  is  part  of  the  basic 
definition  of  a  replication  factor. 

Class  B  designs  then  can  be  of  the  following  two  types. 

Hierarchical  designs:  (a')  and  (b)  are  satisfied.  The  replication 
factor  is  nested  in  all  factors  but  there  is  some  nesting  among  the  main 
factors. 


Repeated  measures  designs:   (b1)  is  satisfied.   Either  (a)  or  (a')  can 
be  satisfied.   The  necessary  feature  (b1)  of  repeated  measures  designs  is 
that  the  replication  factor  is  crossed  with  one  or  more  of  the  main  factors. 
The  factors  in  which  the  replication  factor  is  nested  may  themselves  be  either 
crossed  (a)  or  nested  (a'). 

Class  C  designs  (no  replication  factor) 

Class  C  designs  have  p  factors  and  there  is  no  replication  factor. 
All  factors  must  be  balanced.   For  each  pair  of  factors,  e.g.  factors  A 
and  B,  either  A  is  completely  crossed  with  B,  or  A  is  nested  within  B  or 
B  is  nested  within  A.   There  does  not  necessarily  have  to  be  any  nesting 
at  all. 


BALANOVA  5 


In  summary,  then,  designs  are  classed  in  the  following  way  in  BALANOVA 
5.   Class  A  and  B  designs  have  a  replication  factor,  Class  C  designs  do  not. 
Class  A  designs  are  distinguished  from  Class  B  designs  in  that  a  Class  A 
must  have  l)   all  main  factors  completely  crossed  and  2)   the  replication 
factor  nested  in  all  main  factors.   Class  B  designs  violate  one  or  both 
these  requirements. 

In  Class  A  and  B  designs,  the  replication  factor  does  not  need  to  be 
balanced.   However  all  nested  factors,  except  the  replication  factor  (if 
any^,  must  be  balanced.   Recall  that  in  Class  A  designs,  the  replication 
factor  is  the  only  nested  factor. 

As  explained  above,  the  replication  factor  is  distinguished  from 
other  nested  factors  since  it  does  not  have  to  be  balanced.   There  are 
two  other  reasons  for  distinguishing  the  replication  factor  from  other 
nested  factors.   These  reasons  are  important  even  if  the  replication 
factor  is  balanced. 

1.  In  Class  A  designs  (completely  crossed  with  replications)  only 
cell  means  are  stored  in  the  computer  and  thus  very  large  designs 
can  be  accommodated.   The  allowable  number  of  replications  in 
each  cell  is  virtually  unlimited. 

2.  For  all  replication  designs,  whether  of  Class  A  or  B,  the  level 
number  for  the  replication  factor  in  each  nest  does  not  need 

to  run  from  one  up  to  the  maximum  number  of  levels  in  each 
nest  as  it  does  for  all  other  factors.   Any  convenient  numbering 
of  the  replications  may  be  used  (e.g.  a  unique  number  for  every 
subject  in  the  experiment,  regardless  of  the  nest  within  which 
he  is"> .   This  feature  of  BALANOVA  5  is  especially  useful  when 
several  dependent  variables  are  analyzed  and  there  is  missing 
data  for  some  of  the  subjects  for  some  of  the  dependent  variables. 

l.U  Calculations  for  equal  and  unequal  number  of  replications 

The  calculations  performed  by  BALANOVA  5  for  designs  with  a  replication 
factor  (Class  A  and  B  designs)  depend  on  whether  the  number  of  replications 
in  each  cell  are  equal  or  unequal.   If  the  numbers  are  equal,  the  standard 
analysis  of  variance  calculation  is  made  (least-squares  or  weighted  means 
analysis^.   If  the  numbers  are  unequal,  a  check  is  first  made  to  see  if  the 
cell  N's  are  proportional.   In  a  two-way  analysis  of  variance,  for  example, 
the  cell  N's  are  proportional  if  the  number  of  replications  in  the  ij  cell, 
N-ji,  satisfies 


1J     Nrr 

where  the  T's  indicate  marginal  totals.   If  the  cell  N's  are  proportional, 
BALANOVA  5  makes  the  least-squares  calculations,  i.e.  weighted  means  are 
used.   If  the  cell  N's  are  not  proportional,  then  the  method  of  unweighted 
means  is  used  (See  Scheffe,  pp.  262-3  or  Winer,  pp.  222-U) . 
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In  general,  if  1,    j,  k, 


X    are  those  factors  within  which  the 


replication  factor  is  nested  (not  necessarily  all  the  factors  in  the  design'), 


and  if  N 

then  the  cell  N's  are  proportional,  if,  for  all  combinations  ijk...A, 


ink.  ,  .X   ^s  ^e  number  °f   replications  in  a  particular  nested  cell, 


N 


N 


iTT. 


ijk. .  .  .X 


■  T  X  NTjT. 


N 


,q-i 


TTT. 


x  N 


TTT X 


In  this  formula,  the  T's  indicate  marginal  totals  and  q  is  the  number  of 
factors  within  which  the  replication  factor  is  nested.   In  particular,  the 
one-way  analysis  with  unequal  N's  is  a  proportional  design  (i.e.  the  cell 
N's  are  proportional )  by  this  definition,  since 


N. 


N. 


Ni 


(V 


N. 

1 


In  fact  any  design  in  which  the  replication  factor  is  nested  in  only  one 
factor  is  a  proportional  design. 

1-5  Specification  of  a  Design 

Any  design  is  described  by  listing  the  following  information  about  each 
factor  in  the  design,  including  the  replication  factor  if  there  is  one.   The 
information  for  each  factor  is  punched  on  a  separate  card  (a  factor  specifi- 
cation card),  and  the  cards  should  be  in  the  same  order  as  the  factors  are 
in  the  input  data.   Each  parameter  should  be  enclosed  in  parentheses,  and 
each  card  terminated  by  a  period. 


Parameter  1 


Type  of  factor.   The  first  parameter  on  each 
factor  specification  card  should  be  a  zero 
if  the  factor  is  fixed,  and  a  one  if  it  is 
random.   The  replication  factor  is  always  a 
random  factor.   At  least  one  factor  in  every 
design  must  be  random. 


Parameter  2 


Replication  Factor.   If  the  design  has  a 
replication  factor,  this  is  indicated  by 
punching  a  one  for  the  second  parameter. 
A  design  may  have  only  one  replication  factor. 
If  there  is  no  replication  factor,  the  second 
parameter  should  be  zero  (or  blank)  on  all 
of  the  factor  specification  cards. 


Parameter  3-11 


Nesting.   The  factors  in  which  the  given  factor 
( the  one  to  which  this  card  refers^  is  nested 
are  listed.   Factors  are  numbered  from  one 
through  the  number  of  factors  in  the  design. 
If  the  factor  is  not  nested,  parameters  3-H 
may  be  completely  omitted. 
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An  example  of  this  way  of  specifying  a  design  will  be  now  given. 
Consider  a  two-way  analysis  of  variance  with  subjects  within  cells.   The 
design  is  considered  to  have  three  (not  two)  factors,  namely  A  and  B, 
the  main  factors,  and  C,  the  replication  factor.   Suppose  there  are  3 
levels  of  A  and  k   levels  of  B  and  that  each  cell  has  10  subjects.   The 
cards  used  to  perform  this  analysis  are  listed  below.   Each  line  corresponds 
to  one  IBM  card. 

TRANSFORMATIONS (CARDS) (SEQUENTIAL  l) (k) . 

END  PROGRAM 

BALAN0VA( SEQUENTIAL  D (3^ (l) (3) (k) (10) . 

(0>(0). 

(o)(o). 

(1)(1)(1)(2). 

END  PROGRAM 

The  first  card  listed  above  calls  the  TRANSFORMATION  program  and  uses 
it  to  store  raw  data  from  cards  onto  sequential  file  number  1  (SEQUENTIAL  1) . 
The  number  four  is  the  total  number  of  variables,  independent  (3)  +  dependent 
(l).   Since  no  transformations  are  performed,  the  END  PROGRAM  card  immediately 
follows  the  main  program  card. 

The  third  card  listed  above  calls  the  BALANOVA  program.   The  first  para- 
meter is  the  location  of  the  data  (SEQUENTIAL  V,    the  second  is  the  number 
of  factors  (3"1 ,    the  third  is  the  number  of  dependent  variables  (1),  the  fourth 
is  the  number  of  levels  of  the  first  factor  (3S  followed  by  the  number  of 
levels  of  the  second  factor  (k) ,    and  finally  the  number  of  levels  of  the 
last  factor,  which  is  the  maximum  cell  size  (10)  when  considering  the  repli- 
cation factor. 

The  fourth  card  is  the  factor  specification  card  for  factor  1.   The  first 
2  parameters  are  zero,  labeling  this  factor  as  fixed,  and  as  not  being  the 
replication  factor.   The  fifth  card  is  the  factor  specification  card  for  factor 
2  which  is  also  fixed,  and  not  the  replication  factor.   Note  that  parameters 
3-11  are  blank,  as  factors  1  and  2  are  not  nested  in  any  other  factors. 

The  sixth  card  is  the  factor  specification  card  for  factor  3-   Its  four 
parameters  denote  it  as  a  random  factor,  as  the  replication  factor,  and  as 
nested  in  factors  1  and  2.   The  seventh  card  terminates  the  BALANOVA  program. 
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Chapter  2.      Design  Examples 
OMITTED    TEMPORARILY 
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Chapter  3-   Preparation  of  Input 

3-1  Introduction 

The  following  rules  apply  to  the  assignment  of  factor  levels  in  all 
types  of  designs.   There  is  usually  no  need  to  rekeypunch  existing  data 
however,  as  it  is  almost  always  possible  to  create  the  factor  levels  in 
the  TRANSFORMATIONS  program.   If  you  need  help  using  this  program,  see  a 
SOUPAC  consultant. 

(al   Non-replication  Factors  in  Class  A,  E  and  C  designs 

The  levels  for  non-replication  factors  must  run  from  one  (l)  con- 
secutively up  to  the  number  of  levels  given  on  the  BALANOVA  call  card. 
E.g.,  if  a  factor  represents  four  treatment  groups,  these  groups  must  be 
numbered  1,  2,  3,  and  k   and  each  subject's  row  or  rows  in  the  data  matrix 
must  have  a  1,  2,  3,    or  k   punched  to  indicate  the  group  he  is  in.   If  the 
factor  is  nested,  the  level  numbers  must  run  from  one  (l^  up,  in  each 
cell  of  the  nest.   See  the  example  given  in  Section  3-2. 

(b^   Replication  Factors  in  Class  A  design 

The  replication  numbers  (level  numbers"!  can  be  anything,  for  example, 
a  subject  identification  number.   The  subject  numbers  do  not  have  to  be 
unique  either  in  a  group  or  between  groups.   In  fact,  to  tell  the  truth, 
in  Class  A  designs,  the  replication  level  is  not  used  but  it  must  never- 
theless appear,  even  if  it  is  a  dummy.   This  statement  does  not  apply  to 
other  design  classes. 

(c)   Replication  Factors  in  Class  B  designs,  of  repeated  measures  type 

Special  care  must  be  taken  with  the  replication  levels  in  those  designs. 
Let  us  divide  the  non-replication  factors  into  two  groups : 

a- set:   those  factors  in  which  the  replication  factor  is  nested. 

P-set:   all  other  factors  -  i.e.  those  factors  crossed  with  the 
replication  factor. 

If  the  p-set  is  empty,  the  design  is  of  hierarchical  rather  than  repeated 
measures  type.   See  paragraph  (d)  below. 

Let  us  denote  by  an  a- cell  a  particular  set  of  levels  of  the  factors  in 
the  a-set.   The  replications  in  this  cell  may  be  any  values  (not  necessarily 
from  one  (IV  up)  but  must  be  distinct.   Again,  the  numberings  in  two  different 
a- cells  do  not  have  to  be  distinct,  but  can  be.   In  other  words,  if  the 
replication  factor  is  subjects,  an  identification  number  may  be  used  as  the 
replication  level.   Now  each  subject  appears  in  more  than  one  row  (card)  of 
the  data  matrix  since  each  subject  appears  with  every  combination  of  levels 
of  the  factors  in  the  p-set.   Now  it  should  be  obvious  that  every  row  that 
refers  to  the  same  subject  has  to  have  the  same  replication  number.   This  is 
the  only  way  that  BALANOVA  5  can  tell  that  two  different  rows  refer  to  the 
same  subject. 
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(d'l   Replication  Factors  in  Class  B  designs,  of  hierarchical  type 

The  subject  numbers  must  all  be  different  within  any  one  cell  (one 
level  set  of  the  non-replication  factors)  but  may  be  the  same  over 
different  cells. 

Special  note  on  missing  data 

If  a  dependent  variable' field  on  a  card  is  totally  blank,  BALANOVA  5 
does  not  include  the  score  in  the  analysis  for  the  given  dependent  variable. 
However  other  non-blank  dependent  variable  fields  on  the  same  card  will  be 
included  in  their  respective  analyses. 

Do  not  confuse  this  deletion  of  missing  data  with  an  error  comment  by 
BALANOVA  5  to  the  effect  that  there  is  no  data  cell  A  =  1,  B  =  2.   This 
comment  means  that  no  data  card  with  A  =  1,  B  =  2  had  non-blank  data  for 
the  given  dependent  variable'. 

3-2  Data  matrix  examples 

Class  A  design 

In  all  the  following  examples,  it  is  assumed  data  is  stored  on 
sequential  file  number  1  (SEQUENTIAL  l) . 

Consider  a  two-way  design  with  three  subjects  in  each  cell.   For  the 
purpose  of  BALANOVA,  subjects  are  also  considered  to  be  a  factor,  the 
replication  factor.   Suppose  that  there  are  two  dependent  variables,  and 
further  that  the  factor  specification  cards  are  listed  in  the  order  given 
below,  following  the  main  program  card. 

BALANOVA (SEQUENTIAL  V (3) (2) (2) (3) (3) • 
(0)(0). 

(D(iHiH3N. 
(oKo). 

END  PROGRAM 

Note  that,  contrary  to  the  usual  case,  the  replication  is  the  second  factor. 
This  illustrates  one  flexible  feature  of  BALANOVA  5.   A  data  matrix  could  be; 

1       1       1       20      19 
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12       1       25       2 


2 

13 

2 
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15 

2 

lU 

2 

2 

20 

2 

15 

2 

3 

3 

2 

16 

3 

1+ 

k 

2 

17 

3 

5 

-1 

2 

18 

3 

3 

2 

Note  that  the  first  column  is  the  A  level,  the  third  column  is  the  C  level 
and  the  second  column  is  the  replication  level,  which  in  Class  A  designs  can 
be  anything.   The  last  two  columns  are  the  dependent  variables.   Each  row 
of  the  data  matrix  would  be  punched  on  one  or  more  cards.   A  possible  format 
would  be  (3F5.0,3X,2F6.0) . 

The  order  of  the  rows  is  immaterial.   They  could  be  in  any  order  and 
have  been  written  in  a  systematic  order  only  for  convenience. 

Class  B  design  -  repeated  measures 

The  data  in  Winer,  Table  7-2-3  could  be  analyzed  with  the  following  pro- 
gram, using  three  factors  and  one  dependent  variable. 

BALANOVA (SEQUENTIAL  l) (3) (l) (2) (k) (3) • 
(0)(0). 

(oVo). 

END  PROGRAM 


Data  Card 

s  : 

1 

1 

l 

0 

1 

2 

l 

0 

1 

3 

l 

5 

1 

k 

l 

3 

1 

1 

2 

3 

1 

2 

2 

1 

1 

3 

2 

5 

1 

k 

2 

^ 

etc. 

d 

1 

5 

5 

2 

o 

5 

U 

2 

3 

5 

6 

2 

h 

5 

6 

2 

1 

6 

7 

2 

2 

6 

5 

2 

3 

6 

8 

2 

k 

6 

9 

Again,  the  rows  could  be  any  order.   The  subjects  in  the  second  level  of  factor 
A  could  be  assigned  level  numbers  1,  2,  3  or  any  other  thre  distinct  numbers. 
A  possible  format  for  this  matrix  is  (2F5 .0,F6 .0,1X,F7.0) . 

Another  repeated  measure  design  is  Winer,  Table  J.k-3-      Here  we  have 
four  factors  and  one  dependent  variable. 
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BALANOVA( SEQUENTIAL  l")  (If)  (l)  (2)  (2)  (3)  (k) 
(0)(0). 

(o)(o). 

(1)(1)(1)(2). 
(0)(0). 

END  PROGRAM 


Data  Card 

s : 

1 

1 

1 

1 

18 

1 

1 

1 

2 

llf 

1 

1 

1 

3 

12 

1 

1 

1 

k 

6 

1 

1 

2 

1 

19 

1 

1 

2 

2 

12 

1 

1 

2 

3 

8 

1 

1 

2 

h 

If 

etc. 

1 

2 

6 

l 

18 

1 

2 

6 

2 

10 

1 

2 

6 

3 

5 

1 

2 

6 

If 

1 

2 

1 

7 

1 

16 

o 

1 

7 

c 

10 

2 

1 

7 

3 

8 

2 

1 

7 

if 

If 

etc. 

2 

2 

12 

1 

16 

2 

2 

12 

2 

12 

2 

2 

12 

3 

8 

2 

2 

12 

If 

8 

The  assignment  of  levels  to  factors  A,  B,  D  (columns  1,  2,    h)   has  to  be  as 
shown  above,  but  the  subjects  numbers  can  be  changed  provided  that,  for  each 
(A,B)  cell,  no  two  subjects  have  the  same  number.   A  possible  format  is 
(2F5-0,  1X,2F5-0,F12.0) . 

Class  B  design  -  hierarchical 

Consider  the  following  design  with  three  factors  and  one  dependent  vari- 
able. 

BALANOVA ( SEQUENTIAL! )  (3^  (D  (2)  (2)  (If)  . 

(o)(o). 
(o)(o)(i). 

(1)(1)(1)(2). 

END  PROGRAM 

This  is  a  hospitals  (factor  2)  within  drugs  (factor  1)  design,  illustrated 
below,  with  unequal  cell  size. 
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Cards : 

1    1 

1 

5-0 

1    1 

2 

k.2 

1    2 

1 

5-6 

1    2 

£1 

3-2 

1    2 

3 

1+.6 

2    1 

1 

5-3 

2    1 

2 

8.2 

2    1 

3 

k.3 

2    1 

U 

6.3 

2    2 

1 

5-7 

2    2 

2 

6.8 

Note  that  the  hospitals  are  numbered  1,  2  in  each  level  of  factor  1  even  though 
there  are  four  different  hospitals  involved.   This  is  necessary  since  factor  2 
is  a  non-replication  factor  -  see  the  rules  about  Data  Cards  in  Section  3-1- 
There  are  2  patients  in  hospital  1  for  drug  1,    3  patients  in  hospital  2  for 
drug  1,  k   patients  in  hospital  1  for  drug  2  and  2  patients  in  hospital  2  for 
drug  2.   The  patient  numbering  is  flexible  -  it  could  be  a  different  number 
for  every  patient,  regardless  of  hospital.   A  possible  format  is  (3F5 -0,F7.l) . 

Class  C  design 

The  example  in  Winer,  Table  h.3-1,    could  be  set  up  as  follows,  with  two  factors 
and  one  dependent  variable. 

BALANOVA  (SEQUENTIAL  1)  (?.)  (l)  (5)  (k)  . 

(D(0). 

(0)(0). 

END  PROGRAM 


Data  Cai 
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The  rows  have  been  written  in  a  non-systematic  order  to  emphasis  that, 
without  exception,  in  BALANOVA  5  the  data  rows  can  be  in  any  order.  A 
possible  format  is  (2F5 -0,F10.0) . 
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Chapter  h.      Program  Details 

U.l  Method  and  program  flow 

The  program  follows  the  procedures  in  Scheffe  (1959) ,    Chapter  8.   Scheffe's 
discussion  will  not  be  repeated  here,  but  only  a  general  description  of  the 
program  flow  will  be  given.   The  names  of  the  subroutines  used  are  indicated 
in  case  reference  is  made  to  the  program  listing.   Many  of  the  minor  steps 
and  subroutines  are  not  described. 

1.   Main  Program 


2. 


Calls  subroutines  and  routes  your  data  through  the  program. 
Design  input  and  check  (iNPUTD) 


The  Factor  Specification  Cards  are  read  and  checked  for  errors.   Many 
of  the  error  conditions  mentioned  in  Section  k.3   are  checked  in  INPUTD. 
The  design  is  transformed  into  the  symbolic  notation  of  live,  dead  and 
absent  subscripts  as  in  Scheffe. 

3-   Derivation  of  all  legal  sources  (LEGALS,NEWS) 

All  possible  interactions  are  generated  but  only  one  interaction  with 
a  given  set  of  subscripts  is  retained.   The  procedure  is  identical  to  Scheffe, 
p.  277-  para.  1.   The  program  now  has  a  list  of  all  legal  sources  (including 
the  original  factors'!  . 

k.      Expected  mean  squares  (AUXIL,EMS) 

The  expected  mean  squares  for  each  source  are,  of  course,  not  computable 
numbers,  bu!,  rather  symbolic  expressions.   (cf.  last  column,  Table  8.2.2, 
Scheffe).   The  program  generates  and  prints  these  expression  in  a  form  very 
close  to  the  normal  printed  form.   The  method  is  from  Scheffe,  pp.  28U-8. 

5-   Denominator  for  each  source  (FINDEN) 

By  the  standard  procedures,  using  the  expected  mean  squares,  the  program 
determines  the  correct  denominator  (if  any)  for  each  source. 

6.   Sorting  of  sources  for  summary  table  (SORT) 

The  sources  are  sorted  in  a  convenient  order,  combining  all  sources 
with  the  same  denominator.   This  order  is  then  used  in  printing  the  summary 
table . 

7   Input  of  data  (INPUTX) 

The  input  data  is  read  from  the  input  device  (the  first  parameter  on  the 
main  program  card' .   The  grand  means  for  each  dependent  variable  are  computed, 
ignoring  missing  ( blank ^  data. 
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8.  Storage  of  data  for  one  dependent  variable  (READX) 

This  routine  as  well  as  all  the  remaining  ones  are  executed  in  cycle 
once  for  each  dependent  variable .   READX  stored  the  data  in  core  and  checks 
that  no  data  is  missing  in  the  design.   The  data  is  actually  stored  as 
deviations  from  the  grand  mean.   This  is  done  to  improve  accuracy.   See 
Section  k.2. 

9.  Check  of  replication  numbers  (CELLN) 

In  the  case  of  Class  A  and  B  designs,  BALANOVA  5  checks  whether  the 
call  frequencies  are  equal,  proportional  or  non-proportional. 

10.  Computation  of  sum  of  squares  (SSEQU, SSPR0P,XMEAN) 

The  marginal  means  and  sums  of  squares  for  each  legal  source  are 
calculated. 

11.  Computation  and  printing  of  final  summary  table  (FISHER, FPRINT) 
These  calculations  are  made  in  the  standard  way. 

k.2     Some  comments  on  accuracy  of  computation 

An  attempt  was  made  in  the  design  of  this  program  to  eliminate  the 
largest  sources  of  computational  inaccuracies  that  can  occur  in  analysis  of 
variance  calculations. 

Consider  a  one-way  analysis  with  the  following  data: 

Group  1  Group  2  Group  3 

8.88 
8.90 
8.90 
8.91 
8.91 

8.90  9-00 

Sums  of  squares  computations  are  generally  made  as  the  sums  and 
differences  of  two  or  more  terms.   In  this  example,  the  exact  calculations 
would  be 

SS  between  =  1188.2500  -  1188.1500 
=  0.1000 

SS  within  =  ll88.255l+  -  1188.2500 
=  0.005*+ 

Note  that  these  answers  each  have  a  string  of  zeros  following  the  given  digits 
since  they  are  exact.   However  on  a  computer,  with  about  8  digit  accuracy,  the 
differences  would  only  be  accurate  to  about  four  decimals  due  to  the  cancellation 
of  all  the  higher  order  digits  by  subtraction. 


8 

77 

8 

79 

8 

80 

82 

ans 

8 

82 

Me 

8 

80 

8.96 

8.99 

9-00 

9-02 

9-03 
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This  is  illustrated  by  a  calculation  using  the  previous  analysis  of 
variance  program  in  SOUPAC.   SOUPAC ' s  answers  were: 


SS  between  =  0.10000610 
SS  within   =  0.0053863525 


(5  significant  digits) 
(2  significant  digits) 


Note  the  large  errors  in  these  SS.   Even  worse  errors  can  occur  in  other  data. 

The  whole  problem  could  be  avoided  by  accumulating  a  true  sum  of  squares, 
that  is,  by  adding  positive  numbers  to  form  each  SS  rather  than  taking  a 
difference  of  two  large  numbers.   However  this  procedure  was  rejected  because 
it  is  extremely  slow. 

The  following  procedure  is  used  in  BALANOVA  5  and  it  is  very  effective. 
The  data  are  internally  transformed  to  deviations  from  the  grand  mean.   This 
is  why  the  grand  means  are  computed  in  subroutine  INPUTX  before  the  deviations 
are  actually  stored  in  the  memory  in  subroutine  READX.  When  the  deviations 
are  used  the  individual  terms  which  are  added  and  subtracted  to  give  each 
SS  are  now  numbers  of  approximately  the  same  size  as  the  SS  itself.   This 
means  that  the  number  of  significant  digits  in  the  SS  is  large  even  if  the 
grand  mean  is  large.   In  the  example  given  above,  the  deviation  scores  are: 


Group  1 

-•13 
-.11 
-.10 
-.08 
-.08 


Means 


.10 


Group  2 

-.02 

.00 

.00 

+  .01 

+  .01 

.00 


Group  3 


+ 

.06 

+ 

.09 

+ 

.10 

+ 

.12 

+ 

.12 

+  .10 


and  the  SS  are  computed  as 

SS  between  =  0.10000000  -  0.00000000 
=  0.10000000 

SS  within   =  0. 105^0000  -  0.10000000 
=  0.005^0000 

The  actual  results  produced  by  BALANOVA  5  were 

SS  between  =  0.099999998  (8  significant  digits) 

SS  within  =  0.0053999992         (7  significant  digits) 

Note  the  great  improvement  in  accuracy. 

As  a  final  feature  of  BALANOVA  5,  the  approximate  number  of  significant 
digits  in  each  SS  is  calculated  and  printed  alongside  each  SS.   These  numbers 
should  not  be  interpreted  exactly  but  only  as  a  warning  when  they  are  small. 
The  approximate  number  of  significant  digits  is  calculated  in  the  following  way: 
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(a)  Find  the  largest  term,  in  absolute  value,  entering  into  the 
calculation  of  the  SS.   In  the  example  above,  the  largest 
term  in  SS  within  is  the  first  term  (0.105U0000) . 

(b)  Take  the  ratio  of  this  largest  term  to  the  SS  itself.   In  the 
example,  this  ratio  =  0.105^0000/0. 005^0000. 

(c)  The  approximate  number  of  significant  digits  is  then  =  8.0-log,Q 
(ratio).   In  the  example,  this  is  8.0-1. 38  =  6.62  which  is 
printed  by  BALANOVA  5  as  7j  a  pretty  good  estimate. 

Note  that  the  number  of  significant  digits  printed  by  BALANOVA  5  reflects 
the  loss  of  accuracy  in  the  computation  of  the  SS  from  two  or  more  terms. 
It  does  not  reflect  loss  of  accuracy  due  to  computation  of  the  terms  them- 
selves. 

U.3  Error  conditions 

BALANOVA  5  makes  a  detailed  check  to  insure  that  the  design  is  legal, 
that  none  of  the  computer  storage  arrays  are  exceeded  and  that  all  the  data 
corresponds  to  cells  within  the  specified  design.   The  following  general 
types  of  errors  are  distinguished  and  corresponding  error  messages  are 
printed  giving  detailed  instructions  about  how  to  correct  the  error. 

1.  One  of  the  restrictions  on  program  size  has  been  exceeded.   These 
restrictions  are: 

(a)   maximum  number  of  factors  =  10 

(bN;   maximum  number  of  legal  sources  =  100 

(c)  maximum  size  of  X- storage  array  (used  for  data,  means  and 
cell  numbers)  =  10,000 

(d)  maximum  number  of  dependent  variables  =  200 

(e)  maximum  number  of  sigma-squared  terms  in  any  one  expected 
mean  square  =  10 

2.  The  factor  specification  cards  are  incorrect  or  inconsistent. 
This  is,  the  design  is  illegal.   The  checks  made  are: 

(a)  all  nested  factors  must  be  listed  as  a  factor. 

(b)  no  factor  may  be  nested  within  itself. 

(c)  at  most  one  factor  can  be  the  replication  factor. 
Furthermore,  the  replication  factor  must  be  nested 
in  at  least  one  other  factor  and  no  factor  can  be 
nested  in  the  replication  factor. 
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(d)  the  factor  type  must  be  fixed  or  random. 

(e)  the  maximum  number  of  levels  for  each  factor  must  be 
more  than  one . 

(f)  there  must  be  at  least  one  denominator  term  in  the 
analysis  of  variance  summary  table.   If  this  is  not 

the  case  it  is  probably  due  to  no  factor  being  designated 
as  a  random  factor. 

3.   A  Data  Card  has  a  level  set  which  exceeds  the  limits  stated  in 
the  maximum  number  of  levels  on  the  Factor  Specification  Cards. 

k.      Once  the  data  for  a  dependent  variable  has  been  read  in,  a 
detailed  check  is  made  to  insure  that  all  cells  in  the  design  are  filled. 
If  one  cell  is  not,  the  calculation  for  that  design  is  deleted  and  the 
program  moves  on  to  the  next  dependent  variable  after  printing  sufficient 
information  for  the  user  to  locate  the  missing  datum.   An  additional  check 
for  Class  B  and  C  designs  is  made  to  insure  that  data  for  a  given  subscript 
set  is  not  read  in  twice.   If  two  data  cards  specify  the  same  level  set, 
a  comment  is  made  to  this  effect  and  the  calculations  for  the  dependent 
variable  are  deleted.   Note  that  both  the  checks  mentioned  in  this  paragraph 
are  made  independently  for  each  dependent  variable  and  are  made  after  the 
missing  data  (blank  fields)  for  that  dependent  variable  have  been  deleted. 
Errors  referred  to  in  this  paragraph  are  not  fatal  and  the  program  proceeds 
to  the  next  dependent  variable. 

5.   About  one  dozen  other  checks  are  made.   They  should  always  be 
passed  satisfactorily  since  the  design  is  first  checked  as  above.   These 
additional  checks  were  inserted  to  assist  in  debugging  the  program  and 
if  one  of  them  fails  it  indicates  a  remaining  error  in  the  program.   A 
printed  message  is  made  to  this  effect  in  these  cases. 

k.k     Program  Checkout 

BALANOVA  5  has  been  checked  on  a  large  number  of  designs.   Among  these, 
the  following  calculations  were  reproduced  by  BALANOVA  5 : 

1.  Lindquist,  p.  266,  Class  A. 

2.  Winer,  Table  7-8-3  (P-  376^,  both  least-squares  and  unweighted 
means,  Class  B. 

k.5     Number  of  levels  of  the  replication  factor 

The  rules  in  Section  1.5  and  Chapter  3  are  strict  in  the  sense  that, 
if  they  are  followed,  BALANOVA  5  will  execute  correctly.   However  the  rules 
may  be  relaxed  or  ignored  in  the  case  of  the  number  of  levels  of  the  repli- 
cation factor  and  it  is  sometimes  convenient  to  do  so. 

In  Class  A  designs,  any  number  of  levels  of  the  replication  factor  may 
be  punched  on  the  Factor  Specification  Card  provided  the  number  is  >  2.      This 
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is  so  because  in  Class  A  designs  only  cell  means  are  stored  and  the  program 
does  not  check  the  replication  number  anyway.   This  rule  relaxation  is  use- 
ful when  it  is  inconvenient  for  the  user  to  calculate  ahead  of  time  how  many 
subjects  are  in  each  cell. 

In  Class  B  designs,  any  number  of  levels  of  the  replication  factor  may 
be  punched  on  the  Factor  Specification  Card  provided 

(al   the  number  is  >  the  maximum  number  of  replications 
in  any  one  nest,  and 

(b)   the  number  is  not  so  large  that  the  restriction  on 
the  size  of  the  X  matrix  (10,000)  is  exceeded. 

Again  this  rule  relaxation  saves  the  user  from  having  to  know  the  maximum 
number  of  replications  before  using  BALAW0VA  5,  provided  he  knows  an  upper 
limit.   The  restriction  on  the  size  of  the  X  matrix  will  not  often  be  ex- 
ceeded.  However,  the  user  has  not  been  informed,  in  this  manual,  how  to 
estimate  the  size  of  the  X  matrix  needed,  since  this  limit  is  complicated 
to  specify. 
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Appendix.   Key  to  Designs  in  Winer  and  Lindquist 
OMITTED  TEMPORARILY 


CLASSIFICATION 


I .   General  Description 

The  CLASSIFICATION  program  is  designed  to  measure  individuals  against 
previously  determined  groups  in  order  to  determine  probable  group  member- 
ship.  The  classification  is  done  in  a  reduced  test  space  derived  from 
discriminant  analysis.   The  method  of  classification  is  based  on  the 
premise  that  a. group  is  totally  described  by  its  mean  (or  centroid)  and 
dispersion;  the  individual's  relation  to  each  group  is  determined  by  a 
X2  which  indicates  how  many  members  of  the  group  are  farther  from  the 
centroid  than  he,  and  a  Bayesian  probability  of  membership  in  the  group 
based  on  this  X2.   For  each  individual,  the  X2  and  probability  for  each 
group  are  given;  the  user  then  applies  a  decision  rule  of  his  choice 
for  assigning  individuals  to  groups. 

Since  analysis  is  to  be  performed  in  a  reduced  space,  the  means  and 
dispersion  of  each  group  must  also  be  reduced  to  this  space.   The  CLASSI- 
FICATION program  performs  these  reductions. 

The  calculation  of  probabilities  requires  the  specification  of  the 
number  of  members  in  each  group  against  which  the  individual  is  being 
compared.   This  may  be  the  numbers  actually  in  the  groups  used  for 
finding  experimental  means  and  standard  deviations  or  the  number  of 
individuals  from  the  total  group  being  tested  who  are  to  be  assigned 
to  each  group.   These  numbers  are  specified  as  subparamters . 

In  every  case  the  input  is  expected  in  the  form  in  which  it  is 
output  by  the  DISCRIMINANT  ANALYSIS  program.   Input  is  limited  to  16 
groups  and  50  variables. 

II .   Parameters 

The  program  name  CLASSIFICATION  is  followed  by  these  parameters  on 
the  program  call  card: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address  of  discriminant  vectors. 
The  vectors  are  expected  as  columns. 
CARDS  or  SEQUENTIAL  1-15- 

2  Input  Address  of  means.   The  means  for  a 
given  group  are  expected  as  a  row.   CARDS 
or  SEQUENTIAL  1-15- 

3  Input  Address  of  dispersion  matrices.   The 
dispersion  matrices  are  expected  in  a 
vertically  augmented  form.   CARDS  or 
SEQUENTIAL  1-15- 

k  Input  Address  of  individual  scores.   CARDS 

or  SEQUENTIAL  1-15- 
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Parameter 
Number  Use  or  Meaning 

5  Output  Address  for  inverse  of  dispersion 

matrix.   PRINT  or  SEQUENTIAL  1-15. 

2 

6  Output  Address  for  X  and  probabilities  for 

each  subject.   SEQUENTIAL  1-15  and/or  PRINT. 

(Output  on  SEQUENTIAL  is  in  the  form 

N.,  X._, X.  ,  Y.,, Y. 

1/   il'      in'   ll      in 

where  N.  =  sequential  subject  number  in  groups 

X. .  =  j   probabilities  for  i   subject 

th  2      th       . 
Y. .  =  j   X  for  i   subject) 

7  .  Number  of  groups. 

8  Number  of  variables. 

9  Input  Address  of  group  sample  sizes. 

CARDS  or  SEQUENTIAL  1-15- * 


*     Must  be  a  row  vector  (l  observation)  with  the  number  of  discriminant 
vectors  as  the  first  variable.   The  form  output  by  Discriminant  Analysis 
Program  may  be  used.   If  provided  independent  of  Discriminant  Analysis, 
be  sure  to  use  single  precision  integer  mode  ( I -format,  fixed  mode). 
If  sample  sizes  and  any  other  input  are  from  cards,  sample  sizes  are 
given  first. 
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DISCRIMINANT  ANALYSIS 

I.   GENERAL  DESCRIPTION 

Suppose  that  we  have  k  populations  (groups)  and  p  measures  (variables) 
on  each  member  of  each  population.  We  want  to  test  the  hypothesis  that  our 
groups  are  significantly  different  on  the  entire  set  of  variables.   This 
one-way  multivariate  analysis  of  variance  hypothesis  is  tested  by  this 
program.   The  program  then  locates  the  dimensions  (discriminant  functions) 
along  which  the  group  differences  are  maximum.   Thus,  we  need  some 
function  to  transform  the  p  variates  into  a  smaller  set  of  independent 
measures  which  will  indicate  the  differences  between  the  groups.   The 
DISCRIMINANT  ANALYSIS  program  finds  the  independent  linear  functions  of 
the  variables  which  maximally  discriminate  between  the  populations  (groups 
input  1.   The  results  from  this  program,  namely  the  discriminant  functions, 
may  be  used  in  the  CLASSIFICATION  program  to  determine  the  probability  that 
any  subject  belongs  in  any  group.   Also,  by  looking  at  the  coefficients  of 
the  functions,  we  can  determine  to  what  extent  each  of  the  p  variates 
contributes  to  each  function.   In  order  to  do  this  we  need  to  determine  the 
coefficients  of  the  functions  such  that  the  ratio  of  variances  between 
groups  to  the  variances  within  groups  is  maximized,  i.e.  the  differences 
between  groups  are  to  be  large  relative  to  the  differences  within  groups. 

In  matrix  terms,  we  are  trying  to  maximize  the  ratio 

f .  '  Af. 


f . 'Wf . 
1   1 


th  -1 

where  f .  is  the  eigenvector  associated  with  the  i   eigenvalue  \i   of  W   A, 

A  =  the  covariance  matrix  between  means, 


k 

a. .  =  Z  N   (X.   -  X.)  (X.   -  X.) 
ij   g=1  S   ig    i    Dg    3 

and  W  =  the  covariance  matrix  within  classes, 

k   Ng 
w.  .  =  £  [  E   (X.    -  X.  )  (X.    -  X.  )] 
13      g=l  n=l   xgn         lg    3gn         Jg 

where  k  =  number  of  groups,  N  =  number  of  subjects  in  group  g,  N  =  total 
number  of  subjects,  and  i  and  j  run  from  1  to  p,  where  p  =  number  of 
variables. 

To  find  the  maximum,  we  derive  from  the  partial  derivatives  of  that 
ratio,  the  matrix  equation 

(W_1  A  -  XI )  F  =0 

where  F  is  the  matrix  of  eigenvectors.   The  eigenvectors  are  the  coefficients 
of  the  discriminant  functions.   The  relative  sizes  of  the  eigenvalues 
indicate  the  extent  to  which  the  associated  discriminant  functions  distinguish 
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among  the  groups.   The  percentage  of  the  total  discriminating  power  of 
the  variables  contained  in  the  j"th  discriminant  function  is  represented  by 

100  ( ^ 


N  (N  should  be  the  smaller  of 

£  X.  k-1  and  p) 


i=l 


i 


In  addition  to  obtaining  the  eigenvalues  and  discriminating  coefficients, 
the  program  will  compute  scaled  vectors  to  show  the  relative  contributions 
of  the  variables  to  the  discriminant  function  by 

f   '  =  (»..)  ^2/f.. 
id     1X    '   ^ 

II.  INPUT 

Input  to  the  DISCRIMINANT  ANALYSIS  program  consists  of  two  or  more 
data  groups.   Each  data  group  consists  of  a  set  of  observations  on 
two  or  more  variables.   All  of  the  groups  must  contain  observations  on 
the  same  set  of  variables.   The  groups  may  be  input  as  separate  card  decks 
(each  preceded  by  a  DATA  format  card  and  followed  by  an  END#  card),  as 
data  groups  located  on  separate  temporary  storage  areas,  or  as  a 
mixture  of  data  groups  on  card  decks  and  data  decks  on  temporary  storage 
areas.   For  examples  see  Section  VIII. 

III.  SIGNIFICANCE  TESTS 

The  measure  of  significance  calculated  in  the  DISCRIMINANT  ANALYSIS 
program  is  a  Wilks'  lambda  (likelihood  ratio  test  statistic).   This  is 
a  test  of  the  discriminating  power  of  the  test  battery.   It  tests  the 
hypothesis  that  the  population  centroids  (mean  vectors)  are  equal  for 
the  k  groups.   The  Wilks'  lambda  is  a  function  of  the  roots  of  W_l A  and 
is  of  the  following  form: 

r     1 


A  =  n   1-X. 

i-1 


where  r  is  the  lesser  of  k  -  1  and  p.   In  matrix  terms  this  criterion  is 
defined  in  the  following  manner: 


A  = 


W 


where  W  is  the  pooled  within  groups  deviation  score  cross-products  and  T 
is  the  total  sample  covariance  matrix.   As  |t|  increases  relative  to  |w| 
the  ratio  decreases  in  value  with  an  accompanying  increase  in  the  confidence 
that  the  group  centroids  are  not  equal. 
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An  F  ratio  which  yields  an  apjroximate  test  of  the  significance  of 
the  Wilks'  lambda  is  calculated  and  printed. 


F  m  {i_^JL)  (a^gx.) 


where  s  =  /  ,  2  2  4  77,    2    2    .  q  =  k  -  1 

(p  q  -**  )/(p  +  q  -5  ) 

m=n-  (p+q+  l)/2  n  =  N  -  1 

A  =  -(pq  -  2)/U  N  =  total  number  of  subjects 

r  =  pq/2  k  =  number  of  groups 

1/2 
y  =  I\  p  =  number  of  variables 

The  degrees  of  freedom  to  be  used  with  the  F  value  printed  in  the  output 
are  printed  and  are  labeled  Fl  (degrees  of  freedom  for  the  numerator) 
and  F2  (degrees  of  freedom  for  the  denominator)  and  equal  2r  and 
mx  +  2A. 

IV.   OUTPUT 

The  output  consists  of  the  following: 

1.  Means  of  input  variables  for  each  group  and  group  sample  size 
(Parameter  Number  8) 

2.  A  dispersion  matrix  for  each  group.   (Parameter  Number  9) 

3.  The  total  sample  deviation  score  cross-products  matrix 

N         _         _ 

t.  .  =  Z   (X.   -  X. )(x.   -  X.) 

ij   n=1   in    1   jn    j 

where  i  and  j  range  over  the  variables.   This  matrix  is  the  sum 
of  the  A  and  W  matrix  described  in  section  I.   This  is  the  T  matrix 
referred  to  in  Parameter  Number  7 •   The  diagonal  of  this  matrix 
contains  the  sums  of  squares.   (Parameter  Number  6) 

k.      The  pooled  within-groups  deviation  scores  cross-products  matrix  which 
is  labeled  W  on  the  output.   (Parameter  Number  6) 

5.  The  total  number  of  subjects  in  all  the  groups  combined.   (Parameter 
Number  6 ) 

6.  The  means  and  standard  deviations  of  the  variables  across  all 
groups.   (Parameter  Number  6) 

7.  The  correlation  matrix  of  variables  over  all  groups.   (Parameter 
Number  6 ) 
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8.  The  among  groups  cross-products  of  deviations  of  group  from  grand 
means  weighted  by  group  sizes.   This  matrix  is  labeled  A  matrix  on 
the  output.   (Parameter  Number  6) 

9.  The  eigenvalues  for  the  W   A  matrix.     (Printed) 

10.  The  eigenvalues  and  percentage  of  variance  explained  by  each 
additional  eigenvalue.   (Printed  on  output,  automatically) 

11.  The  trace  of  the  W   A  matrix.   This  is  the  sum  of  the  eigenvalues. 
(Printed  on  output,  automatically) 

12.  The  discriminant  functions  (f^*).   The  number  of  discriminant 
function  will  equal  r  where  r  is  the  lesser  of  the  two  values 
k-1  and  p ,  where  k  =  the  number  of  groups  and  p  =  number  of 
variables.   (Parameter  Number  l) 

13-   The  group  means  on  the  discriminant  functions.   This  is  a 

k  x  r  matrix  formed  by  multiplying  the  group  means  on  variables 
and  the  discriminant  functions.   The  matrix  may  be  used  to  determine 
the  relative  positions  of  the  groups  on  the  derived  function. 
(Parameter  Number  11 ) 

lU.   The  scaled  vectors.   These  vectors  are  formed  by  multiplying  the 
discriminant  functions  by  the  square  roots  of  the  diagonal  of 
the  W  matrix  described  above.   The  scaled  vectors  show  the 
relative  contributions  of  the  input  variables  to  each  of  the 
discriminant  functions.   (Parameter  Number  5) 

15.   The  measures  of  significance  described  in  Section  III.   (Parameter 
Number  k ) 

V.   RESTRICTIONS 

The  DISCRIMINANT  ANALYSIS  program  follow  the  program  name  on  the  main 
program  card.   Each  parameter  must  be  enclosed  in  parentheses.   The  parameters 
must  appear  in  the  order  given  below.   If  a  parameter  is  not  needed,  do  not 
punch  anything  between  its  parentheses.   All  parentheses  after  the  last 
non-empty  pair  may  be  omitted. 

Parameter 
Number  Use  or  Meaning 

1  fi  Output  address  of  discriminant  functions 

(Matrix  f. .).   SEQUENTIAL  1  -15  and/or 
PRINT.    1J(Needed  for  CLASSIFICATION). 

2.  Number  of  variables 

3  Number  of  groups 

h  1  if  desire  significance  measures  printed. 
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Parameter 
Number 


6 

7 


Use  or  Meaning 

1  if  desire  scaled  discriminant  vectors 
printed. 

1  if  desire  intermediate  results  printed. 

1  if  input  is  W  and  T  matrices  instead 
of  raw  data. 


Output  Address  of  group  means  on  original 
variable  and  sample  size  (printed  only).* 
SEQUENTIAL  1-35  and/or  PRINT.   (See  Parameter 
12)   (Needed  for  CLASSIFICATION). 

Q   Output  Address  of  group  dispersion  matrices 
of  original  variables.*   SEQUENTIAL  1-15 
and/or  PRINT.   (Needed  for  CLASSIFICATION). 


10 


N  -  total  number  of  subjects  in  all  groups 
combined.   This  parameter  is  left  blank  if 
raw  data  is  input  rather  than  W  and  T 
matrices . 


11 


Q   Output  Address  of  group  means  on  discriminant 
functions.   SEQUENTIAL  1-15  and/or  PRINT. 


12 


Output  Address  of  group  sample  sizes.   Possibly 
needed  for  CLASSIFICATION. 


Input  addresses  of  raw  data  input  groups  or  W  and  T  matrices 
are  listed  on  a  $INPUT  card.   The  number  of  raw  data  groups  is  limited 
to  l6.   W  precedes  T  in  the  sequence  when  these  are  input.   W  and  T 
may  be  output  using  a  $0UTPUT  card.   Output  order  is  the  same  as 
input . 

*   If  W  and  T  are  input  instead  of  raw  data,  group  means  and  dispersion 
matrices  are  not  printed.   Means  and  dispersion  matrices  on  discriminant 
functions  are  not  computed  in  this  case. 

Q   It  is  possible  to  print  in  F  format  and/or  punch  the  output  from 

these  parameters.   If  you  need  either  of  these  options,  see  the  section  in 

the  INTRODUCTION  on  INPUT  and  OUTPUT. 

VII.   SPECIAL  COMMENTS 


1.   This  program  does  not  check  for  missing  data, 
are  read  as  zeros . 


All  blank  spaces 


2.  The  user  is  cautioned  against  using  the  DISCRIMINANT  ANALYSIS 
program  without  an  understanding  of  the  statistical  technique 
used.   See  the  references  (Section  IX). 
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.   EXAMPLES 

A 

B 

/*ID 

/•ID 

//   EXEC   SOUP 

//   EXEC   SOUP 

//SYSIN  DD   * 

//SYSIN  DD   * 

MATRIX . 

DIS(S1/P)(33)(M( 

MOV(C)(S5). 

$INPUT(C)(C). 

END  P 

END  SOUP 

DIS(S1)(U0)(U)(1)()()()(32/P). 

DATA(33)(33F2.0) 

$INP(C)(C)(S5)(C). 

, 

END  SOUPAC 

(Data  for  W) 

DATA  ( 1+0  )(1+0F2.0) 

END# 

DATA(33)(33F2.0) 

(Data  for  T) 

END# 
/* 

END# 
/* 

In  example  A  ,  four  groups  of  data  are  being  input  with  the  first 
two  groups  coming  from  cards,  the  third  from  temporary  storage  on  S5 
and  the  fourth  from  cards.   Discriminant  functions  are  stored  on 
SI  and  printed  and  group  means  are  stored  on  S2  and  printed. 
Significance  measures  are  calculated  for  33  variables  input.   In  example 
B,  W  and  T  matrices  are  input  and  much  the  same  results  are  obtained  as 
for  example  A.   Group  means  cannot  be  calculated. 
/*ID    (accounting  information) 
//   EXEC   SOUPAC 
//SYSIN  DD   * 

DIS(Sl)(l5)(2)()()()()(S2)(S3)()()(Sl4). 
$INP(C)(C). 

CLA(S1)(S2)(S3)(C)(P)(P)(2)(15)(D(SM. 

END  S 

DATA(15) 


END# 
/* 

This  illustrates  the  use  of  the  DISCRIMINANT  and  CLASSIFICATION  programs. 
The  DISCRIMINANT  program  will  save  discriminant  functions  on  SI;  it 
will  operate  on  15  variables  for  each  of  two  groups.   It  will  output 
group  means  on  S2  and  groups  dispersion  matrices  and  store  group  sample 
sizes  on  Sk . 


CLASSIFICATION  in  turn  will  read  discriminant  functions,  group  means, 
and  group  dispersion  matrices  from  SI,  S2,  and  S3  respectively.   It 
will  read  the  group  to  be  classified  from  cards  and  print  inverse  of 
dispersion  matrices  and  x   and  probability.   It  will  expect  two 
sets  of  group  means  and  dispersion  matrices  from  15  variables  but 
1  discriminant  function  (See  Section  IV).   Original  group  sample 
sizes  are  read  from  S^. 


DISCRIMINANT  ANALYSIS 
Page  7 
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References  on  output: 

Wilks '  Lambda  -  first  formula,  page  119 ,   Cooley  and  Lohnes. 
F  ratio  -  formula  k.l,   page  62.   Cooley  and  Lohnes. 


•  ^-TEST 

General  Description 

The  T-TEST  program  calculates  a  T  coefficient  or  F  ratio  as  described 
tielow: 

Suboperation  (l):   Paired  T-Test  (also  called  correlated  T-Test ) . 
Variables  are  in  a  row,  variable  on  is  paired  with  variable  two,  three  with 
four,  etc.,  and  a  paired  T  coefficient  is  calculated  for  each  pair  as 
follows : 

t  =  d/s_ 
d 
N 
d  =  E   [Xa  -  X3]/N.  ,  where   a   and  3  are  the  j   pair  of  variables 

S_  =  N.   E    [Xa  -  X6]2  -   (  E   [Xa   -  X3])2/N.2.f] 
d    J  i=l  i=l  J 

where  f  =  degrees  of  freedom  =  N.  or  N.  -1  as  desired  and  N.  is  the  sample 

J     J  J 

size  for  the  j    pair  of  variables. 

Suboperation  (2):   Paired  T-Test  for  all  possible  combinations  of 
variables  computed  as  in  Suboperation  (l). 

Suboperation  (3):   Test  of  differences  from  a  known  population  mean. 
A  population  mean  must  be  provided  for  each  column  of  data  or  the 
mean  will  be  set  to  zero.   Population  means  should  be  provided  as  a 
row  vector.   The  following  are  calculated  and  printed  for  each  vari- 
able : 

t  value:   t  =  (X  -jj  )/S_ 

x 
where  p  =  parameterized  value  or  zero 

_   N- 
Mean:   X  =  E1  X./N. 

1=1   X   X  ?  2 

N.EX7  -  (EX. ) 


Standard  Deviation:   S.D.  =( 


l 


N.x(d.f . ). 
l       l 


NOTE:   N-l  is  the  usual  degrees  of  freedom,  but  1  may  be  specified. 

S.D. 
Standard  Error  of  Mean:   S_  =  -— — — - 

x    /~ N 

Suboperation  (h):      Test  of  difference^  from  a  known  population  mean  for 
previously  analyzed  data:   the  mean  (X),  standard  deviation  (S.  D.),  and 
sample  size  (N)  as  well  as  the  population  mean  [see  suboperation  (3)]  are 
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read  in  and  the  program  computes  the  standard  error  of  the  mean  and  T 
for  each  trio  of  the  X,  S.D.,  and  N  read  in  that  order.   If  R  trios  of 
data  are  used  (R  observations  on  rows  occur"1  then  R  population  means 
should  be  given.   Calculations  are  the  same  as  in  Suboperation  (3). 

NOTE:  A  column  of  data  of  Suboperation  (3)  is  reduced  to  a  three  item 
row  here. 

Suboperation  (5) :   Test  of  differences  between  two  or  more  group  means 
taken  pairwise:   each   group  is  located  on  a  separate  storage  location 
or  set  of  cards.   The  following  are  calculated: 

t  value:   t  =  (X.  -  X.)/S 

1    J   - 

Xi  -  x. 


Mean:   X.    £  Xj/Ni 


2    N     2 

N.  T±x(   "  (^  Xi) 


Standard  Deviation:   S.D. 


_i=l__^l/2 


N.x(d.f. ). 


N   2    N  N.       N, 

b±z   x    -  (z   T~)      Nd£J  xd  -  (z3  yr) 

Pooled  Estimate  of  Variance:  &~   =  [ — — — — +  — ^ — i^](d.f. 

i  J 

S2(N     +  N   )      / 

Estimate   of  Standard  Error:      S_  =    [ t-=- <L.pf 

xi   -  x.  Id 

Suboperation  (6) :   One-way  analysis  of  variance.   The  data  to  be  compared 
are  located  on  different  storage  units  or  sets  of  cards.   Calculations  are 
made  as  follows  for  each  variable : 

S  =  number  of  storage  units  =  number  of  subroups 

N-  =  number  of  observations  in  i   subgroup,  i=l, ■,   S,  and  for 

each  i,  j=l, ,  Nj_ 

X.  .  =  element  in  the  j    row  of  the  i   subgroup 

S 
N  =  £  N.  =  total  observations 
i=l 

S   Ni  S    Ni      ? 

Total  SS  =  £   (£  X. .  )    -    (z         £   X±i)  /N 
i=l  j=l  J      i=l  J=l 

Ni 

£  X.j.    =  sum  of  each  constant  i  (i.e.  over  the  subgroup) 

3=1 
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S  N  S   K        , 

Between  SS  =[  E   E   X.  .   ]/N.   -   (E   E   X.J 

i=i  3=1   J    x     i=i  3=1  1J 

Within  SS  =  Total  SS  -  Between  SS 

Within  D.F.  =  Total  D.  F.  -  Between  D.  F. 


F  = 


BSS 
BDF 

WSS 
WDF 


Suboperation  (7)  :   One  way  analysis  of  covariance.   The  experimental 

(dependent)  variable  comes   first  followed  "by  up  to  kg   covariates .   The 

dependent  variables  is  adjusted  to  the  set  of  covariates,  not  iteratively 

to  one  covariate  at  a  time.   Subgroups  or  factor  levels  are  handled  as 

in  analysis  of  variance,  Suboperation  (6),  i.e.  as  separate  input  decks 

or  temporary  storage  locations.   Coefficients  obtained  are  means  and  standard 

deviations,  test  of  homogenity  of  regression,  F-ratio  for  covariance, 

and  adjustment  coefficients.   For  further  discussion  see  Winer,  p.  578  ff. 

References 

Bryant,  E.  C. ,  Statistical  Analysis.   New  York:   McGraw-Hill,  i960. 

Snedecor,  G.  W. ,  Statistical  Methods.   Ames:   Iowa  State  College 
Press,  1957. 

Winer,  B.  J.,  Statistical  Principles  in  Experimental  Design. 
New  York:   McGraw-Hill,  1962. 

II.   Restrictions 


The  maximum  number  of  input  variables  is  150  except  for  Suboperation 
(2),  all  possible  paris,  and  Suboperation  (7),  covariance,  where  the 
maximum  is  50.   Suboperations  (5)  and  (6),  tests  of  differences  and  analysis 
of  variance,  require  the  data  to  be  divided  into  2  or  more  subgroups.   The 
maximum  number  of  subgroups  is  lU. 


III.   Parameters 


T-TEST  and  ANALYSIS  OF  VARIANCE 

Parameter 
Number 


1 
2 


Use  or  Meaning 

Suboperations  1-7-   (See  above). 

Number  of  subgroups  (if  applicable). 


T-TEST 
Page  h 

Parameter 
Number  Use  or  Meaning 

3  0  -  Count  clanks  as  zeros 

1  -  Count  blanks  as  missing  data 

k  0  -  use  N-l  as  degrees  of  freedom 

1  -  use  N  as  degrees  of  freedom  (See  Special 
Comments ) . 

Input  addresses  of  subgroups  for  options  5,  6,  and  7  are  listed  on  a 
$-iriP  card.   (See  section  on  SOUPAC  Input  /Output ) .   The  maximum  number  of 
subgroups  is  ik.      See  example  for  illustration.   If  options  1  or  2  are 
used,  provide  only  1  input  address.   For  option  3,  provide  tv6  addresses, 
the  first  for  the  sample  being  tested,  the  second  for  the  criterion  means. 
For  option  h   provide  two  input  addresses,  the  first  for  the  sample  being 
tested,  the  second  for  the  criterion  means,  standard  deviations  and  sample 
size. 

IV.   Special  Comments 

If  it  is  desired  to  use  the  option  of  comparing  the  mean  with  some 
population  mean  other  than  zero  as  in  Suboperations  (3)  and  (k) ,    a  row 
vector  of  \i   must  be  included  in  temporary  storage.   This  row  vector  must 
be  of  length  N  =  number  of  variables. 

Most  work  requires  N-l  degrees  of  freedom. 


V.   Examples 

T-TEST  (l). 
$INP(S1). 

Paired  T-Test  on  data  stored  on  SI. 

T-TEST  (2)(  )(1)(1). 
$INP(S15). 

Paired  T-Test  on  data  stored  on  SI 5  doing  test  on  all  possible  pairs 
checking  for  missing  data  and  using  N   degrees  of  freedom. 

T-TEST  (3) 
$INP(S1)(S3). 

T-Test  of  population  means  on  SI  against  criterion  means  on  S3. 

T-TEST  (5)  (2). 
$INP(S1)(S2) 

T-Test  of  group  mean  of  two  groups,  one  on  SI  and  the  other  on 
S2. 
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T-TEST  (6)  (k)    (1). 
$INP(S1)(C)(C)(S3). 

One-way  analysis  of  variance  over  four  groups,  one  on  SI,  two  from 
cards,  and  one  from  S3,  checking  for  missing  data. 


CORRELATIONS  AND  REGRESSION  PACKAGE 


BISERIAI  CORRELATION 


General  Description 

This  program  calculates  the  following  coefficients  for  each  com- 
bination of  one  dichotomous  and  one  continuous  variable. 

Case  totals:        %   cases  in  p  =  N /N 

%   cases  in  q  =  Na/W 
Total  cases  =  N 


Mean: 

Xp  =  IXp/Np 

Xq  =  ^q/Nq 
X  =  £X/N 

Standard  deviation: 

Sp2  =  2Xp2/Wp  -  Xp2 

Sq2  =  £Xq2/Nq 
S2  =  £X2/N  -  X2 

Biserial  r: 

(xP"xq)  pq 

where  p  =  percentage  of  cases  in  0  category 
q  =  percentage  of  cases  in  1  category 
h  =  height  of  the  normal  curve  computed  from  normal  tables 

The  program  checks  for  missing  data,  and  computes  the  above  measures 
only  for  those  cases  where  both  dichotomous  and  continuous  variables  are 
present. 

II .   Restrictions 

A.  Dimension 

The  input  for  the  BISERIAL  R  program  is  limited  to  a  maximum  of 
20  dichotomous  variables  and  100  continuous  variables. 

B.  Type  of  input 

Data  is  read  row-wise  from  either  tape  or  cards.   All  dichotomous 
variables  must  be  first  in  each  row.   They  should  be  coded  with 
0  and  1 

III .   Parameters 

The  program  call  card  requires  h   parameters  after  the  program  name, 
BISERAL  R: 
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Parameter 
Number  ~  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15- 

2  Output  Address  of  correlations.   SEQUENTIAL  1-15 
and/or  PRINT. 

3  Number  of  dichotomous  variables. 

k  Number  of  continuous  variables. 

5  0  -  ignore  blanks 

1  -  count  blanks  as  zeros 


CANONICAL  ANALYSIS 

I.   GENERAL  DESCRIPTION 

A.   Mathematics 

The  Canonical  Correlation  program  provides  a  multivariate  test  of  the 
hypothesis  that  two  sets  of  normally  distributed  variables  are  independent. 
The  larger  set  of  variables  (the  predictor  variables)  is  considered  to 
have  q  members,  and  the  smaller  set  (criterion  variables)  has  p  members. 
This  program  also  linearly  transform  each  set  of  variables  into  a  new  set 
of  independent  variables,  (or  dimensions)  such  that  the  first  new  predictor 
(a  linear  combination  of  the  original  predictors)  has  maximum  correlation 
with  the  first  new  criterion  variable.   The  second  new  predictor  is 
maximally  correlated  with  the  second  new  criterion,  and  so  on  (with  the 
constraint  that  each  new  variable  is  uncorrelated  with  the  previous  new 
variables  derived  from  the  same  set  of  original  variables  ) . 

Let  the  criteria  set  consist  of  the  p_  variables  x  , ,x  and 

the  predictor  of  q_  other  variates  x  +  -,,...., x     .   Assume  p<_  q..   We 
then  look  for  weighting  matrices  c^g  and  dalD  such  that 

Ka   =  3  °3  X3      a  =  1,  2,  ...  p 

3  =  1,  2,  .  .  .  p 


b   abxb      a  =  1,  2, 


P 


b  =  p+1,  p+2,  .  .  .  p+q. 

The  variables  5  and  nhave  the  following  properties: 

1.  They  are  standardized  variables. 

2.  Within  each  set,  the   £'s  are  independent  and  the  x\ '  s  are  dependent 

3.  Within  the  set  E,a    (as  a  runs  from  1  through  p)  and  within  the  set  of  na 
(as  a_  runs  from  1  through  p)  the  correlation  is  zero.   These  matrices 

are  formed  in  such  a  way  that  E,^   and  l-i  »   5p  and  no  '  et°'  are  maximally 
correlated  giving  p  correlations  l,....,p. 

The  purpose  of  this  program  is  to  find  the  p  correlations  A  ,...,A  and 

the  weighting  matrices  c   and  d 

op      ab 

B  Procedure 

The  correlation  matrix  R  is  first  partioned  into: 


A 


C 


where  A  =  correlation  among  predictors 

C  =  correlation  between  predictors  and  criteria 
B  -  correlation  among  criteria 
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The  matrix  equation 


(CA_1  C'  -  X2B)  U  =  0 


is  solved  to  find  the  p  correlations  X    , A  ,  and  the  criteria  weighting 

matrix  U  whose  elements  are  cag.   The  eigenvalues  D  and  eigenvectors  H  of  B 
are  found:   CA  C  is  then  post-multiplied  by  HD~l/2  and  pre-multiplied  by- 
its  transpose  to  reduce  the  equation  to  the  standard  form  (Z  -  Al)U=0. 
Then  the  predictor  weighting  matrix  V  whose  elements  are  d   is  computed  by 
the  equation 

V  =  (A-1  C  U)  1/X 


II.   INPUT 

Input  to  the  CANONICAL  ANALYSIS  program  consists  of  a  correlation  matrix. 
These  variables  include  a  set  of  predictor  variables  and  a  set  of  criterion 
variables.   Either  set  may  be  first  on  the  input  data  but  there  can  be  no 
mixing  of  the  two  types  of  variables  on  the  input  data.   The  TRANSFORMATION 
program  may  be  used  to  reorder  the  variables  if  they  are  mixed  on  the 
card  data  deck. 

III.   SIGNIFICANCE  TESTS 

Included  in  the  printed  output  of  the  CANONICAL  program  is  a  Chi-square 
value  for  each  of  the  eigenvalues  A  2  computed  in  the  program.   The  chi-square 
values  printed  are  determined  from  the  Wilks '  lambda  values  using  the 
procedure  outlined  by  Bartlett  (see  Section  IX ).   The  chi-square  values 
provide  a  test  of  the  null  hypothesis  that  the  p_  variates  are  unrelated  to 
the  q  variates.   If  there  is  at  least  one  way  in  which  a  linear  combination 
of  the  predictor  variables  is  correlated  with  a  linear  combination  of 
the  criterion  variables  this  Chi-square  value  will  be  significant.   The 
second  Chi-square  may  then  be  examined.   This  Chi-square  is  a  test  of 
a  second  relationship  after  the  first  relationship  has  been  removed.   If 
this  Chi-square  is  significant  a  second  linear  combination  of  the  predictor 
variables  is  correlated  with  a  second  linear  combination  of  the  criterion 
variables.   This  process  continues  until  the  first  non-significant 
Chi-square  is  found.   All  Chi-squares  beyond  that  point  will  be  non- 
significant. 

IV .   OUTPUT 

The  output  consists  of  the  following: 

1.  The  matrix  of  standardized  regression  coefficients.  This  is  the  matrix 
of  coefficients  which  would  be  formed  if  the  raw  data  used  to  calculate 
the  correlation  matrix  input  had  been  converted  to  standard  scores. 

The  predictor  variables  are  on  the  rows  of  the  matrix  and  the  criterion 
variables  are  on  the  columns  of  the  matrix. 

2 

2.  A  multiple  correlation  squared  (R  )  for  each  of  the  criterion  variables. 

The  first  R^  value  is  the  multiple  correlation  of  the  first  criterion 
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variable  with  the  entire  set  of  predictors  variables.   The  second  is 
for  the  second  criterion  variable  with  the  set  of  predictors,  etc. 

2 

3.   A  set  of  eigenvalues  A,  correlations  A,  Wilks'  lambdas,  Chi-squares, 
and  degrees  of  freedom.   (See  Section  III)  (Printed) 

k.      A  matrix  of  criterion  weights.   (Parameter  Number  3) 

5.   A  matrix  of  predictor  weights.   (Parameter  Number  k) 

V .  RESTRICTIONS 

A.  Dimension 

The  maximum  permissible  number  of  variables  in  each  set  is  80. 

B.  Special  Conditions 

The  matrices  A  and  B  must  both  be  non-singular 

C.  Number  of  criteria  (p)  aiust  be  equal  to  less  than  the  number  of 
predictors  (q). 

VI.  PARAMETERS 

The  parameters  for  the  CANONICAL  ANALYSIS  program  follow  the  program 
name  on  the  main  program  card.   Each  parameter  must  be  enclosed  in 
parentheses.  The  parameters  must  appear  in  the  order  given  below.   If 
a  parameter  is  not  needed,  do  not  punch  anything  between  its  parentheses. 
All  parentheses  after  the  last  non-empty  pair  may  be  omitted. 

Parameter 


Number  Use  or  Meaning 

1  Input  Address  (correlation  matrix).  CARDS 

or  SEQUENTIAL  1-5- 

2  Output  Address  of  canonical  correlations. 

SEQUENTIAL  1-5  and/or  PRINT. 

3  ft  Output  Address  of  criterion  weighting 

matrix.   SEQUENTIAL  1-5  and/or  PRINT. 

h  Q     Output  Address  of  predictor  weighting 

matrix.   SEQUENTIAL  1-5  and/or  PRINT. 

5  Number  of  predictor  variables. 

6  Number  of  criterion  variables.  (Must 

be  less  than  or  equal  to  number  of 
predictors ) . 

7  Order  of  variable  sets  on  input: 

1  if  predictors  are  first 

2  if  criteria  are  first 


Parameter 
Numbers 


9 
10 
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Use  or  Meaning 

Operation  to  be  performed: 

1  if  only  canonical  correlations 

2  if  weights  are  to  be  computed 

1  if  want  regression  coefficients  printed 

1  if  want  multiple  correlation  squared 
(R  )  printed 


It  is  possible  to  print  in  F  format  and/or  punch  the  output  from  these 
parameters.   If  you  need  either  of  these  options,  see  the  section  in  the 
Introduction  on  Input  and  Output. 


VII.   SPECIAL  COMMENTS 

This  program  does  not  check  for  missing  data.   All  blank  spaces  are 
read  as  zeros. 


VIII.   EXAMPLES 
1A 

/*ID   <accounting  information> 
//   EXEC   SOUP 
//SOUP.SYSIN  DD   * 

CANONICAL  (CARDS) (SEQUENTIAL  l/PRINTi 
(PRINT) (PRINT) (15) (5) (2) (1). 
ENDS 
DATA 


END# 


/* 


IB 

/*ID  <accounting  information 

//   EXEC   SOUP 

//SYSIN  DD   * 

CAN  (C)  (S1/P)(P)(P)(15)(5)(2)(1). 

ENDS 
DATA 


END# 


n 


Examples  1A  and  IB  illustrate  the  use  of  the  program  call  card  for 
a  CANONICAL  correlation  program.   In  these  examples  the  input  data  consists 
of  15  predictor  and  5  criterion  variables  and  will  be  a  card  deck. 
The  predictor  \rariables  are  the  first  15  variables  of  each  observation 
in  the  data  deck.   The  printed  output  will  be  the  canonical  correlations, 
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the  criterion  weighting  matrix,  and  the  predictor  weighting  matrix.   The 
canonical  correlation  matrix  will  also  he  stored  on  temporary  storage 
SEQUENTIAL  1.   1A  &  IB  perform  same  calculations. 

IX.   REFERENCES 

For  a  discussion  of  the  uses  of  canonical  analysis,  see  Kendall,  M.G. , 
A  Course  in  Multivariate  Analysis,  New  York,  Hafner,  196l,  pp.  68-85 
or  Kendall,  M.G. ,  The  Advanced  Theory  of  Statistics,  New  York,  Hafner 
1951,  Vol.  II.,  pp.  3U8-358. 

For  the  derivation  of  the  method  used,  see  Anderson,  T.  W. ,  An 

Introduction  to  Multivariate  Statistics,  New  York,  Wiley,  1958, 
pp.  288-296. 

For  significance  test  procedure,  see  Cooley,  W.  W.  and  Lohnes ,  P.  R. , 
Multivariate  Procedures  for  the  Behavioral  Sciences,  New  York, 
Wiley,  1962,  p.  37- 


CORRELATION 

I.   General  Description 

The  main  purpose  of  the  CORRELATION  program  is  the  calculation  of 
Pearson  product-moment  correlations  (hereafter  referred  to  as  correlations 
in  this  writeup) .   A  correlation  measures  the  linear  dependency  between 
two  variables,  and  this  program  calculates  a  correlation  for  each  pair 
of  input  variables.   The  square  of  a  correlation,  sometimes  called  the 
coefficient  of  determination,  represents  the  proportional  reduction  in 
variance  of  one  variable  due  to  a  linear  relationship  with  another.   Thus 
the  coefficient  of  determination  measures  the  strength  of  a  linear  relation- 
ship, or  the  proportion  of  variance  accounted  for  by  a  linear  rule. 

The  CORRELATION  program  automatically  produces  other  types  of 
correlation  coefficients,  because  the  calculations  required  are  identical. 
Thus  point  biserial  coefficients  of  correlation  (often  preferred  to 
biserial  correlation),  phi  coefficients  (alternative  to  tetrachoric 
coefficients),  and  Spearman's  rank  order  correlations  can  be  readily 
obtained.   Thus,  if  the  input  consists  of  dichotomous  variables,  the 
output  will  contain  a  mixture  of  phi's,  point  biserials,  and  ordinary 
correlations.   (A  point  biserial  correlation  is  a  correlation  between 
a  dichotomous  variable  and  a  continuous  variable).   If  the  input  to 
the  correlation  program  consists  of  rank  ordered  data  (ordinal),  the 
output  will  be  Spearman's  rank  order  correlations.   (See  Walker  and  Lev, 
Chapter  11  for  comparisons  and  comments  on  the  above  mentioned  coefficients). 

In  the  process  of  calculating  the  correlations,  the  means  and  standard 
deviations  of  the  individual  variables  are  computed,  as  are  the  cross- 
products  and  covariances  between  variables.  After  the  correlations  have 
been  calculated,  they  are  used  to  calculate  the  linear  regression  coefficient 
and  corresponding  intercept  terms  needed  for  predicting  each  variable  from 
each  other  variable. 

II.  Input 

Input  to  the  CORRELATION  program  consists  of  a  set  of  independent 
observations  on  two  or  more  variables.   The  data  is  considered  as  a 
two-dimensional  array  (or  matrix)  of  numbers  with  each  column  containing 
the  observations  on  one  variable,  and  each  row  consisting  of  one  obser- 
vation on  each  variable.   If  we  use  the  letter  X  to  represent  the  matrix 
of  raw  data,  we  let  Xj_j  represent  the  i~kh  row  (where  i  =  1,  2,  ...  N)  and 
the  jth  column  (where  j  = 1,  2,  . . .  M) .   In  other  words,  we  have  N  obser- 
vations (rows'*  and  M  variables  (columns')  in  our  data  matrix  X. 

III.  Formulas  and  Calculations 

The  following  formulas  define  certain  statistics  and  illustrate  their 
methods  of  calculating  within  the  program.   The  subscript  i  refers  to 
observations  (or  individuals)  and  runs  from  1  to  N.   The  subscripts  j  and 
k  refer  to  variables,  and  they  run  from  1  to  M. 
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N 

Z  X.  . 

-    i=l  1J 

Mean  =  X .  =  — = 

(of  variable  j) 

W  N  N      N 

Z(X      -X   )(X      -X   )         NZX     X      -(   ZX      )(    zx      ; 
1=1    iJ        0        ik       k  i=1ij   ik       i=1x3     i=1iK 

Covariance*        =  C   =  ^— 3- =  n(n  -  l) 

(between  variables 
j  and  k) 


N               _     2  /      N                 N 

Z  (X.  .  -X.)  /W    ZX7.  -  (    ZX.  .) 

-  1=1    1J        J  /      i=l13        1=1  1D  / 

Standard  Deviation*  =  S.    =  v/ — = =J   r-y— — — r =      v  C  .  . 

if          .    ,  -,       .\                    3       s*              W-l  ^                W(N-1                                      jj 
(of  variable   J ) 


Correlation  =     R., 

Ok       S  .S 


W  WW 

w   zx.  .x..  -  (  zx. .)(  zx..  ; 

C.                             .    ,  ij    lk      \    ,  ij    v.    ,  ik' 
jk  i=l      i=l  i=l 


2 


(between  variables  J  j    k        /~N?  W  p     /~~     IT  W 

J    and  k)  /    W    ZX       -  (    ZX      )       /   W    ZX       -  (   ZX      ) 

\/       i=l1J       i=lla      V        i=llk       i=llk 

From  the  equation  X.  .  =  B..X.,  +  A.,  ,    the  program  calculates 

ij    jk  ik    jk 

S. 
Linear  Regression  Coefficient  =  B.,  =  R.,  (— -) 

(for  predicting  variable  j  from  variable  k)  k 

Intercept  =  A   =  X.  -  B.  X 

(constant  term  in  equation  for  '■' 

predicting  variable  j  from  variable  k) 

*NOTE :   the  sample  covariances  and  sample  standard  deviations  are  unbiased 
estimates  of  the  corresponding  population  parameters.   The  definitions 
given  here  follow  the  practice  of  many  current  statisticians.   [See  Anderson 
(1958)  -  Chapter  3  for  example.] 

Significance  Tests 

If  we  assume  that  two  variables  (indexed  by  j  and  k)  have  a  bivariate 
normal  distribution,  there  is  a  test  statistic  for  testing  the  hypothesis 

:'■  the  correlation  in  the  population  is  zero  (or  equivalently  that  either 
regression  coefficient  is  zero).   Even  for  a  relatively  small  sample  size 
(n),  this  hypothesis  can  be  tested  using  the  t  ratio: 


R.,  -s/w-  2 
t  =  J*- 

4       2 

1  -  R ., 
Jk 
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with  N  -  2  degrees  of  freedom.   Other  types  of  hypotheses  can  be  tested 
through  use  of  the  Fisher  R  to  Z  transformation.   [See  Hays  (1963),  pages 
529  -  533  for  example. ] 

V.   Output 

Output  from  the  CORRELATION  program  may  consist  of  any  or  all  of 
the  statistics  from  section  III  above,  by  using  parameters  2  through  7. 
Any  output  from  this  program  may  be  printed  and/or  output  to  temporary 
storage  (SEQUENTIAL  1-5).   The  means,  standard  deviations,  and  the  sample 
size  (N)  are  output  as  a  matrix  with  M  rows  (one  for  each  variable)  and 
three  columns  (the  third  column  will  have  a  constant  value  of  N  for  all 
variables) .   Correlations,  covariances,  and  cross-products  are  printed 
as  lower  triangular  matrices,  while  the  regression  coefficients  and 
intercepts  are  printed  as  square  matrices.  However,  all  five  of  these 
matrices  are  stored  as  square  matrices. 

VI.  Restrictions 

The  CORRELATION  program  will  accept  an  unlimited  number  of 
observations,  but  the  number  of  variables  is  limited  as  noted  in  the 
section  on  PROGRAM  LIMITS  in  the  INTRODUCTION. 

VII.  Parameters 


The  parameters  for  the  CORRELATION  program  follow  the  program 
name  on  the  main  program  card.  Each  parameter  must  be  enclosed  in 
parentheses.  The  parameters  must  appear  in  the  order  given  below. 
If  a  parameter  is  not  needed,  do  not  punch  anything  between  its 
parentheses.  All  parentheses  after  the  last  non-empty  pair  may  be 
omitted. 


Parameter 
Number 


5 
6 


Use  or  Meaning 

Input  Address  of  raw  data  (X  matrix). 
CARDS  or  SEQUENTIAL  1-15  and/or  PRINT. 

Output  Address  for  means,  standard 
deviations,  and  sample  size. 
SEQUENTIAL  1-15  and/or  PRINT. 

ti   Output  Address  for  correlation  matrix 
(R).   SEQUENTIAL  1-15  and/ or  PRINT. 

.0  Output  Address  for  cross-products 
matrix.   SEQUENTIAL  1-15  and/or  PRINT. 

n  Output  Address  for  covariance  matrix  (c). 
SEQUENTIAL  1-15  and/or  PRINT. 

Q   Output  Address  for  matrix  of  regression 
coefficients  (B) .   SEQUENTIAL  1-15  and/or 
PRINT. 


CORRELATION 
Page  h 


7 
8 


fi  Output  Address  for  intercepts  (matrix  A) 
SEQUENTIAL  1-5  and/or  PRINT. 

1  if  last  variable  in  each  row  is  a 
weighting  factor. 


It  is  possible  to  print  in  F  format  and/or  punch  the  output  from  these 
parameters.   If  you  need  either  of  these  options,  see  the  section  in 
the  INTRODUCTION  on  INPUT  and  OUTPUT. 


VIII.   Special  Comments 

1.  This  program  does  not  check  for  missing  data.   All  blank  spaces 
are  read  as  zeroes.   If  you  have  missing  data,  use  the  MISSING 
DATA  CORRELATION  program. 

2.  In  the  output  matrices  of  regression  coefficients  and  intercepts, 
the  row  number  refers  to  the  dependent  variables,  and  the  column 
numbers  refer  to  the  independent  variables. 

3.  If  a  variable  is  constant,  an  error  message  will  be  printed  and 
all  correlations  with  that  variable  will  be  set  to  zero. 

h.  In  order  to  have  the  program  perform  its  calculations  separately 
for  sub samples  of  the  data,  see  the  section  on  CONTROL  VARIABLES 
in  the  INTRODUCTION. 


IX.  Examples 


1A 


/*TD  <accounting  information> 
//  EXEC   SOUP 
//SOUP.SYSIN  DD  * 
CORRELATIONS  (CARDS  1 (  ) (PRINT ^ 
END  SOUPAC 
DATA  (6) (6F2.0) 


IB 

/*ID  <accounting  information> 

//  EXEC  SOUP 

//SYSIN  DD  * 

COR  (C)(  )(P). 

ENDS 

DATA  (6H6F2.0) 


END# 
/* 


END# 


Example  1A  illustrates  the  usage  of  the  CORRELATION  program.   Notice 
that  all  words  are  spelled  out  although  this  is  unnecessary.  Notice  also 
that  correlations  are  to  be  printed  out,  although  the  means  and  standard 
deviations  are  not.   Example  IB  will  perform  exactly  the  same  computations 
as  1A,  except  that  all  instructions  have  been  abbreviated  to  make  keypunching 
easier. 
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2 

/*ID  <accounting  information> 

//  EXEC   SOUP 

//SYS IN  DD  * 

cor  (cHp)(p/si). 

principal  axis  from  (si) to  (s2/p)  with  (10)  factors  and 
(100)  percent  of  the  variance  to  be  removed, 
varimax  rotation  from  (s2)  to  (print). 

ENDS 

DATA  (20)(lOF4.0, 5F6.2/10X, 5F^.l) 


END# 

/* 

In  the  second  example,  the  CORRELATION  program  first  prints  the  means 
and  standard  deviations.   Then  it  prints  the  CORRELATION  matrix  and  stores 
it  on  SEQUENTIAL  1  (Si) .   The  PRINCIPAL  AXIS  program  then  performs  a 
principal  components  analysis  and  outputs  10  components  to  S2.  VARIMAX 
then  rotates  these  10  components,  using  the  VARIMAX  criterion,  and  prints 
the  results. 

X.   References 

T.  W.  Anderson,  An  Introduction  to  Multivariate  Statistical  Analysis; 
John  Wiley  and  Sons,  Inc. ,  1958. 

E.  C  Bryant,  Statistical  Analysis;  McGraw-Hill,  i960,  pp.  113-135- 

W.  L.  Hays,  Statistics  for  Psychologists;  Holt,  Rinehart  and  Winston, 
i960. 

H.  M.  Walker  and  J.  Lev,  Statistical  Inference;  Henry  Holt  and  Company 
New  York,  i960. 


MISSING  DATA  CORRELATION 


General  Description 

The  MISSING  DATA  CORRELATION  program  calculates  the  following  coeffi- 
cients for  every  combination  of  variables: 

Mean:  X±   =  — 

n.  z(x2id)  -  (rx^)2  1/2 

Standard  Deviation:   s  .  .  =  [ — ] 

"        Ntj  (Nirl) 

.         n^eCxy^)  -  (ffl13)(g13) 

Covariance :   S. .  =  


S. 


N,.  (N..-1) 


Correlation :   r 


"  Sx. .Sy. . 


II.   Restrictions 


The  maximum  number  of  variables  for  this  program  is  100. 

The  input  data  to  this  program  may  come  from  any  source  conforming  to 
SOUPAC.  Output  may  be  printed  and  the  correlation  matrix  may  be  placed  on 
any  source  conforming  to  SOUPAC. 


III.   Parameters 


The  parameters  for  the  MISSING  DATA  CORRELATION  program  appear  on  the 
program  card.   They  must  follow  the  program  name  in  the  following  order: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15- 
Default  is  CARDS. 

2  0  -  printing  as  usual 

1  -  printing  is  suppressed 

3  Output  Address  of  correlation  matrix. 

h  Output  Address  for  sample  sizes. 

5  Coding  for  missing  data;  if  left  blank  or  if 

zero  is  entered,  minus  zero  is  used  as  check. 
It  is  NOT  possible  for  this  program  to  count 
true  zeroes  as  missing  data.   This  parameter 
must  be  enclosed  in  asterisks.   Example:   *99*« 


NOTE:  All  output  is  in  double  precision. 
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IV.   Special  Comments 

A.  The  user  is  warned  against  further  processing  of  the  correlations 
output  by  this  program  because  the  correlations  do  not  necessarily 
come  from  the  same  sample. 

B.  For  control  breaks,  data  must  be  presorted  on  the  control  variables 
with  the  last  variable  changing  fastest.   The  maximum  number  of  control 
variables  is  30   Control  variables  begin  on  a  new  card  with  $C-B  in  column 
1  and  are  enclosed  in  parentheses. 

C  .  The  correlation  matrices  can  be  stored  in  parameter  3  is  a  temporary 
storage  address.   However,  if  control  breaks  are  also  being  used, 
only  the  first  matrix  corresponding  to  the  first  control  break  can  be 
saved . 


PARTIAL  CORRELATION 


I .   General  Description 

This  routine,  upon  option,  provides  two  of  the  more  common  types  of 
special  purpose  correlation  coefficients. 

A.   Partial  Correlations: 

This  program  produces  coefficients  of  net  correlation  of  any  order 
froin  1  to  19  in  matrix  form.   Coefficients  of  successively  higher 
order  may  be  obtained  by  repeated  calls  to  the  program,  each  time 
using  as  input  the  previously  generated  partial  correlation  matrix; 
or  several  variables  may  be  held  constant  at  the  same  time  by  one 
call  to  the  program. 

The  general  equation  used  is: 


ij .abc. . . (n-1)    in.abd. . . (n-l)     i  j  < 

.abc. . . (n-l) 

ij .abc . . .n 

(1  "  r2-    UA        t      ,01/2    (1  -  r.    , 
in.abd.  .  .  (n-l)           ;jn.abc 

...(n-l)} 

eferences : 

Mills,  F.C. 

Statistical  Methods,  Holt,  Rinehart  and 

Winston. 

New  York,  1955,  3rd  edition. 

Tetrachoric  Correlations : 

This  type  of  correlation  coefficient  is  used  when  continuous  normally 
distributed  variables  are  measured  dichotomously. 

This  program  is  based  on  a  program  by  Roald  Buhler  at  Princeton 
University  which  in  turn  is  based  on  a  650  program  written  at  the 
Educational  Testing  Service.   The  approximation  used  was  developed 
by  Professor  Ledyard  Tucker. 


II.   Restrictions 


Partial  Correlations : 

Input  matrices  may  be  no  larger  than  lUO  x  lUO  and  must  be  compatible 
with  SOUPAC  conventions.   In  most  cases  the  original  input  to  the 
program  will  be  a  matrix  of  zero  order  correlations  (see  CORRELATION 
program  write-up) . 

Tetrachoric  Correlations : 

This  option  is  limited  to  1^0  variables.   All  observations  should  be 
coded  either  0  or  1.   The  program  generates  cross-count  tables  before 
computing  the  correlation  coefficients. 
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III.   Parameters 

The  program  name,  PARTIAL  CORRELATION,  should  be  followed  by  the 
following  parameters : 


Parameter 
Number 


2 
3 


22 


Use  or  Meaning 

Input  address  of  R  if  partial  correlations  or  raw 
data  if  tetrachoric  correlations.  (R  is  a 
correlation  matrix) . 

Output  Address  of  correlations  desired. 

0  if  tetrachoric  correlations  are  desired 

1  if  partial  correlations  are  desired 

Variables  to  be  held  constant  in  using 
partial  correlations. 


IV.   Special  Comments 

When  there  is  a  zero  cell  or  sufficiently  close  so  that  the 
tetrachoric  correlation  cannot  be  computed  by  this  approximation,  a 
value  of  -1.0  is  used  if  the  missing  cell  is  off-diagonal.   If  a 
diagonal  cell  is  zeroish  (i.e.,  if  a  variable  is  all  zero  or  all  one) 
its  correlations  are  set  to  0.0. 

Blanks  are  counted  as  zeroes. 

Examples 

A  series  of  observations  of  8  variables  are  used  to  obtain  3rd 
order  partial  correlations  with  variables  5,  7>  and  8  held  constant: 

/'■LID 

//  EXEC   SOUPAC 
SYSIN  DD   * 

RRELATIONS  ( CARDS ) ( ) ( SEQ  1 )  . 

RTIAL  CORRELATION  (SEQl)  (PRINTUl)  (5)  (7)  (S"1  • 
...  SOUPAC 
DATA(8)(8F6.2) 


F.ND  -// 
/* 


STEP-WISE  MULTIPLE  CORRELATION 

I.   General  Description 

The  STEP-WISE  MULTIPLE  CORRELATION  program  calculates  the  following: 

-    *i 

Mean :  Xn- 


1  ~  N 


Covariance :   s. 


NE(X.X.)  -  (2C.)(flC.) 


ij  :        N(N-l) 

1/2 
Standard  Deviation:   s.  =  (s..)  ' 
l     li 


Product  Moment  Correlation :   r . 


s.  . 


ij    s. s  . 

i  J 

In  the  step-wise  procedure,  intermediate  results  are  used  to  give 
valuable  statistical  information  at  each  step  in  the  calculation.   These 
intermediate  answers  are  also  used  to  control  the  method  of  calculation. 
A  number  of  intermediate  regression  equations  are  obtained  by  adding  one 
variable  at  a  time  thus  giving  the  following  intermediate  equations. 

a.  Y  =  BQ  +  B1X1 

b.  Y  =  B0  +  B1X1  +  B2X2  ,  etc. 

The  coefficients  for  each  of  these  intermediate  equations  and  the 
reliability  of  each  coefficient  are  obtained  by  the  step-wise  procedure. 
The  values  and  reliability  may  vary  with  each  subsequent  equation.   The 
coefficients  represent  the  best  values  when  the  equation  is  fitted  by  the 
variables  included  in  the  equation.   The  variable  is  added  that  makes  the 
greatest  improvement  in  "goodness  of  fit"  or,  stated  another  way,  gives  the 
greatest  reduction  in  variance  of  the  dependent  variable. 

A  variable  may  be  indicated  to  be  significant  at  an  early  stage  and 
enter  the  regression  equation.   After  several  other  variables  are  added  to 
the  regression  equation,  a  variable  in  the  equation  may  be  indicated  to  be 
insignificant.   Under  this  situation  the  step-wise  regression  procedure  will 
remove  the  insignificant  variable  before  adding  an  additional  variable. 
Thus,  at  the  various  steps  in  the  regression  procedure,  only  those  variables 
which  are  significant  will  be  included  in  the  regression  equation. 

The  F  level  to  enter  a  variable  controls  when  variables  enter  the 
equation  and  the  F  level  to  remove  a  variable  likewise  controls  the  removing 
of  variables  from  the  equation. 

The  last  step  in  the  step-wise  procedure  predicts  the  value  of  the 
dependent  variable  for  each  set  of  observations  based  on  the  final  re- 
gression equation.   Deviation  between  the  actual  and  predicted  values  are 
also  calculated. 

(See  parameter  k) . 
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For  reference  to  formulas  ard  methods  used  see: 

A.  Ralston  and  H.  S.  Wilf,  Mathematical  Methods  for  Digital 
Computers,  New  York,  Wiley  and  Sons,  i960,  pp.  191-195 • 

II .   Restrictions 

The  maximum  number  of  independent  variables  in  this  program  is  199- 
The  dependent  variable  must  be  the  last  variable  of  each  row. 

The  input  data  to  this  program  may  come  from  any  source  conforming  to 
SOUPAC.   Output  may  be  PRINT  only. 

Ill .   Parameters 

The  parameters  for  the  STEP-WISE  MULTIPLE  CORRELATION  program  appear 
on  the  program  call  card.   They  must  follow  the  program  name  in  this  order: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15- 

(See  parameter  h   for  special  conditions). 

2  "F"  level  to  enter  an  independent  variable 
into  the  regression  equation.   An  example 
would  be:  *i+.0* 

3  "F"  level  to  remove  a  variable  from  the 
regression  equation.   An  example  would 
be:  *U.O  * 

h  This  parameter  should  be  set  to  1  if  the 

predicted  dependent  variables  are  to  be 
calculated.   (if  this  option  is  needed, 
input  data  must  not  be  from  cards . )   0  or 
blank  if  not  wanted. 

5  1  if  constant  term  in  equation  is  assumed 
to  equal  zero  (0) . 

6  1  if  want  to  use  weighting  factor.   (if  a 
weighting  factor  is  used,  it  must  be  the 
last  variable  in  the  input  data  row. ) 

7  1  if  intermediate  steps  of  regression  are 
not  to  be  printed. 

8  1  if  do  not  want  cross-product  matrix  printed 

2  if  input  data  is  in  the  following  form: 

1  N       N+l 


CORRELATION 
MATRIX 


N 


STANDARD 
N+1   DEVIATION 

(where  N  is  the  number  of  variables) 


M 
E 
A 
N 
S 


sample 
size 


Parameter 
Number 

9 

10 
11 

12 


STEP-WISE  MULTIPLE   CORRELATION 

Page   3 


Use  or  Meaning 

1  if  do  not  want  means  and  standard  deviation 
printed 

1  if  do  not  want  covariance  to  be  printed 

1  if  do  not  want  correlations  to  be  printed. 

Tolerance  to  be  used  to  determine  when 
singularities  are  assumed  to  occur.   If 
this  parameter  is  left  blank  10~5  is  used. 
If  it  is  desired  to  change  this  parameter, 
the  following  would  be  used:   *1.E  -10* 
where  any  number  could  be  substituted  for 
the  10. 

Output  (intermediate  storage^  of  coefficients. 

First  (N)  variables  are  placed  in  regression 
first. 


13 
11+ 

IV.   Special  Comments 

The  dependent  variable  must  be  the  last  variable  in  the  input  row 
(unless  a  weighting  factor  is  used,  then  the  dependent  variable  will  be  the 
next  to  the  last  variable  in  the  input  row) . 

Negative  F-ratios  may  sometimes  result  in  the  computational  procedure. 
They  should  be  considered  to  be  analytically  zero.   Frequently  negative 
F's  arise  when  input  is  a  missing  data  correlation  matrix.   Results  up  to 
the  negative  F  are  always  correct. 


The  standard  error  of  Y 
predicted  Y. 


;iven  at  each  step  is  the  standard  error  of 


The  program  will  loop  if  a  variable  entered  at  one  step  is  removed 
at  the  very  next  step.   This  can  usually  be  corrected  by  changing  the 
F-levels  to  enter  and  remove  variables. 

V.   Example 


STEP ( CARDS )*k.**h.*{    )(  )(  )(  )(1). 


DISTRIBUTION  ANALYSIS  PACKAGE 


FIT 
(CHI-SQUARE  GOODNESS-OF-FIT  TEST > 


I.   General  Description 

In  statistical  applications,  it  is  frequently  the  case  that  certain 
assumptions  were  made  concerning  the  probability  distribution  of  a  random 
variable.   A  frequent  assumption  is  that  a  particular  variable  is  normally 
distributed.   The  question  that  arises  is  "how  valid  an  assumption  was 
this?"  A  method  of  testing  this  assumption  (or  hypothesis)  is  the  CHI- 
SQUARE  GOODNESS-OF-FIT  TEST. 

This  program  provides  tests  of  hypothesis  that  user's  data  is 
(l)  a  random  sample  from  the  distribution  PO,  where  0  is  a  user  specified 
parameter,  or  (2)  a  random  sample  from  a  class,  PO,  of  distributions  where 
0  is  not  specified.   These  will  be  called  tests  of  type  1  and  type  2  , 
respectively. 

Distributions  which  can  currently  be  tested  in  the  program  are  the 
binomial,  Poisson,  normal,  gamma  and  continuous  rectangular  distributions. 
The  user  may,  for  example,  wish  to  test  the  hypothesis  that   his  data  was 
from  a  normal  distribution  with  variance  1,  for  some  mean. 

The  program  on  the  basis  of  the  user  provided  information  decides  on 

a  set  of  points   X,  <  X„  <  . . . .X    in  the  range  of  the  distribution  under 

12        n 

consideration.   The  test  then  compares  the  observed  numbers,  o^  ,of 

observations  in  each  interval  [X.  n,  X.),  with  the  expected  numbers 

l-l    l 

e.  =  Ps  '{X.  .  ■:  I.     X.}   ,   {Y.  ;  1  <  i  <  m  } 
l    0   1-1—1—1        l     —   — 

Where  0  is  the  user  specified  value  of  0  ,  if  it  was  specified,  and  if 
not,  0  is  the  maximum  likelihood  estimator  for  a  hypothesis  of  form  (2) 
above.   The  comparison  of  o.  with  e.  is  made  by  the  statistic 

n+1  (o   -  e  )2 

X   =  Z   

1-1    Si 

2 
The  distribution  of  this  statistic  has  an  approximate  X  distribution  when 

the  hypothesis  is  true  and  n  is  large.   (See  Billingsley  ,  1961). 

The  program  prints  the  computed  value  of  X  and  the  number  of  degrees  of 

freedom  for  the  test. 

If  one  sample,  with  m  observations,  is  to  be  tested,  input  to  the  program 
from  cards  or  sequential  will  be  one  variable  with  m  observations.   More 
than  one  sample  may  be  tested  at  a  time,  if  the  same  distribution,  or  in  some 
cases  a  distribution  from  the  same  class,  is  the  one  being  considered.   Thus 
a  test  of  each  of  three  samples,  all  of  which  are  on  the  same  sequential 
storage  or  card  deck,   may  be  performed. 
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II.   Distributions  Available 


NOTATION:   In  the  description  below  f(x)  denotes  the  value  of  the  probability- 
density  function  of  the  point  x.   F(x)  is  the  cumulative  distribution  function 
defined  as   Pr  {X  <  x  } 


DISCREET  DISTRIBUTIONS 

The  Distributions  are  classified  here  as  being  "discreet,"  (i.e. 
having  positive  probability  at  a  countable  number  of  values  of  the 
random  variable  it  describes)  or  as  "Continuous"  (having  positive 
density  over  a  continuous  range  of  values  of  the  random  variable). 

1.   BINOMIAL(N,P)    N  a  positive  integer;   0  <_  p  <_  1 

f(x)  --  (  jp   (1-p)  for  x  an  integer 

Vx/  such  that   0  <  x  <  N 


F(x)  =  E   g)  p1  (1-p 


i=0 


N-i  for  x  an  integer 

such  that   0  <  x  <  N 


This  distribution  is  appropriate,  for  example,  in  situations 
where  a  random  variable  is  sampled  independently  (observed)  N  times, 
the  observations,  and  where  the  values  of  the  random  variable  can 
be  classified  precisely  as  being  in  one  of  two  sets  often  denoted 
"success,"  and  "failure,"  with  probabilities  p,  and  q  =  1  -  p 
respectively. 

2.   POISSON  (X)     X>0 

/  \       x   —X 
f(x)  =    X   e  x  >_  0  ;  and  x  an  integer 

x! 

x      i   -X 
F(x)  =   I      A  .?  x  >   0 

i  =  0 

The  Poisson  distribution  is  often  used  as  an  approximation 
to  the  binomial  when  the  number  of  observations  is  large, 
p  =  Pr  {success}   is  small,  and  the  product  n  *  p  is  essentially  a 
constant.   The  Poisson  is  a  pervasive  distribution  in  its  own 
right,  arising  in  situations  where  the  probability  of  no  occurrences, 
or  of  one  occurrence  of  a  phenomenon  in  a  unit  of  time  is  moderate, 
where  the  probability  of  more  than  one  occurrence  is  essentially 
negligible  in  comparison,  and  where  the  frequencies  of  occurrence  in 
adjacent  intervals  are  independent  of  each  other. 
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B.   CONTINUOUS  DISTRIBUTIONS 

2  2 

1.   NORMAL  (\i,    a    )      -«  <  u  <  +«>   ;a   >0 


1  2 

f(x)  =     I 2 eXP      ~2M  _oo  <  X  <  +oo 

211   a  2  a 


dt   -°°  <  x  <  +°° 


Through  the  Central  Limit  Theorem,  the  use  of  this  distribution 
has  been  justified  to  describe  a  tremendous  variety  of  phenomena  in 
which  the  random  variable  under  consideration  is  assumed  to  be  the 
sum  of  a  large  number  of  independent  random  variables ,  each  having 
a  small  contribution  to  the  total. 


2.   GAMMA  (a, 6)    a>0   ,   3  >  0 


f(x) 

„a          a-1        -0x 
3          x            e 

A(a    ) 

x. 

F(x) 

(       a      .a-1      -3t    ,. 
3        t          e          dt 

/           A(a) 

0 

Here 

A(a)    =      (a-1)  A (a-1) 

integer, 

A(a)   =    (a-1)! 

for  x  >  0 


and  so  if  a  is  a  non-negative 

The  gamma  distribution  is  the  sampling  distribution  for  the 
sum  of  a  independent  identically  distributed  "negative  exponential" 
random  variables,  with  3  =A  where  X   is  the  parameter  for  the 
negative  exponential  distribution.   The  negative  exponential 
distribution  is  itself  the  special  case  gamma  (1,3).   The  gamma  distri- 
bution is  used  in  dealing  with  waiting  times,  where  the  expected 
frequency  in  a  given  interval  has  Poisson  distribution.   The 
"Chi-Square"  distribution  is  another  important  special  case  of  the 
gamma  distribution  where 

2 
X   (r)  e  gamma  (r/2,  1/2). 


3.   RECTANGULAR  (0^  ©2)  Qi  "  G2 

1 

(e2-  Oj) 


f<*>   =      (*  -   a.<>  01ix192 


x 


x-0, 


F(x)    =     /      1  dt   =  H_  Ql   - 

4Q2-Q1  V01 
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This  distribution  is  appropriate  where  any  event  in  an 
interval   {0j ,  02}  has  equal  probability  of  occurring,  but  no 
occurrences  will  be  outside  the  interval. 

III.   Use  of  the  Program 

A.  PRELIMINARY 

The  program  is  invoked  by  the  main  parameter  card  (a  SOUPAC 
"Program  Card").   Information  concerning  the  distribution  and 
intervals  to  be  used  are  supplied  on  subparameter  cards  for  the 
program. 

Input  is  in  the  form  of  column  vectors,  and  may  be  from  cards  or 
sequential  storage.   Each  variable  (vector  of  data  will  be  tested 
against  the  same  type  of  distribution,  i.e.  information  provided 
on  the  distribution  card  applies  to  all  variables. 

The  subparameter  card  provides  the  means  of  supplying  information 
to  the  program  concerning  the  distribution  to  be  used,  as  well  as 
the  number  and  size  of  intervals  to  be  used  in  the  tests.   In 
some  cases  a  distribution  parameter  must  be  supplied  by  the  user. 

B.  MAIN  PARAMETER  CARD 

Immediately  following  the  program  name  CHI-SQUARE  GOODNESS-OF-FIT 
(mnemonic:   FIT),  the  following  parameters  are  listed. 


Parameter 
Number 


Use  of  Meaning 

Input  Address.   May  be  CARDS  or 
SEQUENTIAL  (SI,  or  S2,  etc). 


SUBPARAMETER  CARDS 


1.  GENERAL  DESCRIPTION 

One  "distribution  card','   chosen  from  the  list  below, 
should  appear  with  each  call  to  the  program.   For  tests  of 
type  1  ("simple  hypothesis")  the  parameters  of  the  distribution 
are  specified  on  this  card.   For  distributions  allowing  tests 
of  type  2  ("composite  hypothesis")  one  or  more  of  these 
parameters  may  be  left  blank.   Other  parameters  on  the  card 
relate  to  intervals  to  be  used  for  determining  observed  and 
expected  frequencies.   Because  this  test  is  "asymptotically 
valid,"  these  should  be  chosen  so  that  expected  frequencies  of 
any  interval  does  not  fall  below  5- 

2.  DISTRIBUTION  CARDS 

Where  *  *  is  used,  a  parameter  requires  a  decimal  point 
number;  when  (  )  appears  an  integer  is  required. 
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DISTRIBUTION 

BINOMIAL(M)*P*(N). 

M  and  P  are  parameters  of  the  distribution,  P  the  probability 
of  success  on  a  trial,  M  the  number  of  trials.   N  is  the  number 
of  points  (integral  values  starting  at  0)  to  be  grouped  in  each 
interval.   For  a  test  of  Type  2,  the  parameter  P  should  be 
left  blank. 

POISSON*A*(ENDPOINT)(N). 

X  is  the  Poisson  density  parameter,  and  may  be  left  blank  for 
a  test  of  type  2.   N  is  the  number  of  adjacent  points  to  be 
grouped  in  each  interval  until  ENDPOINT  is  reached.   The 
Interval   [ENDPOINT,  +°°)  is  then  the  last  interval. 

RECTANGULAR*Q1**Q2*(N) . 

01,  and  02  are  the  endpoints  for  the  range  of  the  distribution 

01  <  02.   N  is  the  number  of  equal  sized  intervals  to  use  for 

the  test.   Only  tests  of  type  1  are  allowed  with  this  distribution. 

2 
N0RMAL*y**a  **STPT**EP*(N) . 

P  is  the  mean  of  the  distribution,  a" is  the  variance.   Either 
or  both  of  the  parameters  may  be  left  blank  for  a  test  of 
type  2.   STPT  and  ZP  are  the  start  point  and  end  point  of  an 
interval  to  be  broken  up  into  N  equal  sized  intervals  for 
the  test.   The  additional  intervals  (-°°,  STPT),  and  [EP,  +°°  ) 
will  be  used. 

GAMMA*a**3**ENDP0INT*(N) . 

*a*  is  the  degree's  of  freedom  parameter,  and  must  be  specified. 
*6*  is  the  density  parameter  and  may  either  be  specified  for 
a  type  1  test  or  left  out  for  a  type  2  test.   The  portion  of  the 
real  line  between  0  and  ENDPOINT  will  be  divided  into  N  equal  sized 
intervals  for  the  test.   One  additional  interval,  [ENDPT,  +°°  ) 
will  be  included. 

IV.   Restrictions 

A  maximum  of  50  input  variables  (each  variable  a  sample  to  be  tested) 
may  be  input  to  the  program.   Only  one  distribution  or  distribution  type 
(e.g.  NORMAL  with  variance  1  and  any  mean)  may  be  tested. 
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V.   Examples 

/*ID 

//   EXEC   SOUP 

//SYSIN  DD   * 

FIT(CARDS). 

N0R*2 .  0****-10**lU*(2>!  i 

END  P 

END  S 

DATA(  1000,1  )(21X,Fl+.l) 


#END 
/* 

This  program  will  test  the  hypothesis  that  the  sample  is  from  a  normal 
distribution  with  a  mean  of  2,  for  some  variance.   The  interval   [-10, lU) 
will  be  divided  into  2U  pieces,  each  of  length  1,  for  the  test. 


/•ID 

//   EXEC   SOUP 

//SYSIN   DD   * 

FIT (CARDS). 

BIN(l0)*l/2*(2). 

END  P 

END  S 

DATA(600,l)(26X,F2.0) 


#END 
/* 

This  program  will  test  the  simple  (type  l)  hypothesis  that  the  sample  was 

from  a  B  ( 10, 1/2)  population.   Note  that  with  the  given  number  (600)  of 

observations  we  had  to  group  adjacent  pairs  of  points  into  the  same  interval 
to  insure  a  reasonable  expected  frequency  in  every  cell. 


THE  KOLMOGOROV-SMIKNOV  STATISTIC 

I.   General  Description 

The  program  computes  the  Kolmogorov-Smirnov  (K-S)  D  statistic, 


D   =   SUP 
all  x 


FN(x)  -  F(x) 


where  F^  is  the  sample  cumulative  distribution  for  a  sample  of  size  N  and 
F(x)  is  the  specified  cumulative  distribution. 

IT.   Theoretical  Discussion 

The  sample  distribution  function: 

Vx)  =|      3  =  °> N 

will  generally  differ  from  the  population  distribution  function.   If 
the  sample  distribution  differs  exceedingly  from  the  specified 
distribution  F(x),  the  amount  of  the  difference  might  be  of  use  in 
determining  whether  to  accept  the  hypothesized  distribution  as  correct. 
The  Kolmogorov-Smirnov  test  uses  the  maximum  actual  numerical 
difference  If^OO  -  F(x)  I  . 

Example : 

Consider  testing  the  hypothesis  that  a  distribution  is  normal  with 
mean  =  32  and  variance=3.2U  with  10  sample  observations 

31.0  33-7 

31. h  3^.U 

33.3  3^.9 

33.^  36.2 

33.5  37-0 

F  (x)  and  F(x)  are  sketched  below. 
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FN(x)  -  F(x) 


D  =  .5b  which  is  the  maximum  of 

At  the  .95  confidence  level  the  critical  value  D'  =  . U0925 

Since  D  >  D'  the  distribution  being  tested  is  rejected  at  the  3%   level. 

*  See  Owen  [5] 

III.   Notes 

Only  supply  the  parameters  needed  to  determine  a  specified 
distribution.   This  program,  at  this  time,  calculates  Kolmogorov- 
Smirnov  statistic  for  the  following  distributions  if  given  the 
proper  parameters. 

A.  Normal 

1.  mean 

2.  variance 

B.  Chi-Square 

1.  Degrees  of  freedom 

C.  Central  F 

1.  Degrees  of  freedom  numerator 

2.  Degrees  of  freedom  denominator 

D.  Noncentral  F 

1.  Degrees  of  freedom  numerator 

2.  Degrees  of  freedom  denominator 

3-   noncentrality  parameter  N 

when  noncentrality  parameter  X   =  —  £  w 

i=l 

(See  Graybill  [3]  for  further  development  of  \). 

Additional  distributions  are  planned  for  the  near  future.   Please 
see  consultants  for  these  additions. 

IV.   Sample  Programs 

A.   Normal  -  type  1 

2  =  32 
a   =   3. 2k 

K-S (C)(1)  '-':2^.2k*. 


B.  Chi-Square  -  type  2 

d.f.=  2k 

K-S(C) (2)****(2k). 

C.  Central  F  -  type  3 

d.f.  numerator  =  5 
d.f.  denominator  =  16 

K-s(si)(3)****(5)(-l6). 
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D.   Noncentral  F  -  type  3 
d.f.  numerator  =  10 
d.f.  denominator  =  13 
noncentrality  parameter  (X)    =  2.5 

K-S(S2)  (3 )*-*** (10)  (13)*2.5*. 

V.   Parameters 

The  following  parameters  follow  the  mnemonic  K-S : 

Parameter 
Number  Use  or  Meaning 

1  Input  address,  CARDS,  SEQUENTIAL  1-15. 

2  Specified  distribution  (see  Section  IV) . 

3  Floating  point  value  of  the  mean. 

k  Floating  point  value  of  the  variance. 

5  Degrees  of  Freedom  (numerator) . 

6  Degrees  of  Freedom  (denominator) . 

7  Floating  point  value  of  non-centrality 

parameter. 

VI.   References 

[l]  Darling,  D.  A.,  "The  Kolmogorov-Smirnov,  Cramer-Von  Mises  Tests," 

Annals  of  Mathematical  Statistics,  Vol.  28,  (1957),  PP-  823-838. 

[2]  Goodnight,  James,  Department  of  Experimental  Statistics,  N.  C. 
State  University,  Raleigh  ,  North  Carolina.  Mr.  Goodnight 
developed  and  programmed  the  computational  procedure  used  to 
integrate  the  central  and  non- central  F  distribution. 

[3]  Graybill,  F.  A.,  An  Introduction  to  Linear  Statistical  Models, 

Vol.  I,  Chapter  k,    "Distribution  of  Quadratic  forms"  pp7  7^f— 92. 
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[5]   Cwen,  D.  B.,  Handbook  of  Statistical  Tables.   This  book  contains 
a  table  of  the  critical  values  for  the  Kolmogorov-Smirnov 
statistic.   It  is  available  for  reference  in  the  SOUPAC  office. 


FACTOR  ANALYSIS  PACKAGE 


Factor  Analysis  Package 

Users  of  the  PRINCIPAL  AXIS  FACTOR  ANALYSIS  program  are  reminded  that 
if  no  estimates  of  communalities  are  provided  with  the  input  matrix,  the 
output  is  a  principal  components  matrix.   If  communalities  are  provided, 
then  the  output  is  principal  axis  factors.   Whether  communality  estimates 
are  used,  and  if  so,  what  kind,  can  be  a  very  important  conceptual  considera- 
tion.  Mathematically,  the  principal  components  approach  the  principal  axis 
factors  for  large  numbers  of  variables.   The  rate  of  convergence  is  the  ratio 
of  N2  to  N3. 

Centroid  factor  analysis  similarly  requires  that  a  user  consider  the 
question  of  communality  estimates. 

The  ITERATIVE  FACTOR  ANALYSIS  program  on  the  other  hand,  simultaneously 
estimates  communalities  and  factor  loadings. 

The  UNRESTRICTED  MAXIMUM  LIKELIHOOD  procedure  requires  as  input  a 
correlation  matrix  with  unities  on  the  diagonal. 

For  principal  components,  component  scores  are  derived  by  the  equation 
ZR  F  =  X  where  Z  is  the  standardized  raw  scores;  R   the  inverse  of  the 
correlation  matrix;  and  F  the  component   loadings. 

In  general,  factor  scores  from  oblique  rotations  are  obtained  by  the 
formula  X  =  DY~"  M~  F'Z  where  D  is  the  correlation  between  components  and 
the  reference  vectors;  Y  the  reference  vector  transformation;  M  the  roots; 
F  the  principal  components;  and  Z  the  standardized  raw  scores.   For  the 
orthogonal  case,  the  general  formula  is  X  =  TM   F'Z  where  T  is  a  transforma- 
tion matrix,  for  example,  the  varimax  loadings.   Notice  that  if  T  is  the 
identity  matrix  we  have  the  formula  for  component  scores. 

These  general  formulas  are  also  applicable  to  principal  axis  factors, 
as  well. 


BINORMAMIN 

I.   General  Description 

BINORMAMIN  ROTATION  rotates  a  matrix,  F,  of  orthogonal  factor  loadings 
to  oblique  simple  structure. 

It  does  this  by  iterating  for  T  in  FT  =  A  (where  A  is  the  rotated  factor 
pattern ^  so  as  to  minimize: 

Z(v2  /h2)(v2  /h2) 

K"PKP"P  q=P(^v2  /h2)(^2  /h2)J 

3    jp  3    3    jq'  3 

Since  solving  directly  for  K  is  too  complex,  BINORMAMIN  takes  one  vector  at 
a  time,  rotating  it  against  all  the  others,  to  minimize  each  Kp. 

Its  name  comes  from  the  fact  that  is  uses  a  double  (Bl)  NORMAlization 
in  seeking  a  MINimum. 

For  further  information  see : 

1.  Kaiser,  H.  F.  and  Dickman,  K.  W.,  "Analytic  Determination  of 
Common  Factors".   Unpublished  manuscript,  University  of 
Illinois,  1959- 

2.  Harmon,  H.  H.,   Modern  Factor  Analysis  .   Chicago,  University 
of  Chicago  Press,  i960.   pp.326ff. 

II.   Restrictions 

Input  is  limited  to  matrices  of  150  x  30  or  less. 
n.   Parameters 

After  the  program  name,  BINORMAMIN,  are  the  following  parameters: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address  of  factor  matrix,  F. 

2  Output  Address  of  factor  matrix,  F. 

3  Output  Address  of  the  transformation 
matrix,  T. 

h  Output  Address  of  the  reference  vector 

structure,  V. 

Output  Address  of  the  correlations  between 
reference  vectors. 

Output  Address  of  the  primary  factor 
pattern,  P. 
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Parameter 
Number 

7 
8 


10 

11 
Note  on  Output : 


Note  on  parameters  8  and  9 
meets  first. 


Use  or  Meaning 

Output  Address  of  the  correlations  between 
factors. 

Maximum  number  of  iterations  (see  note).  If 
blank,  the  maximum  will  be  set  at  100  itera- 
tions . 

Convergence  criterion  (see  note). 

A.  Defined  zero  change:  iterating  will  stop 
when  each  element  in  V  changes  by  less 
than  A.   (A  must  be  less  than  .2  and  no 
less  than  .0000001). 

B.  Defined  zero  rotation:   iterating  will 
stop  when  each  vector  in  T  changes  by- 
less  than  9,    where  0  is  the  angle  whose 
cosine  is  B.   (B  must  be  less  than  1.0 
and  no  less  than  .2). 

If  left  blank,  A  will  be  set  to  .001. 

If  an  initial  T  is  to  be  read  in,  input 
address  of  T. 

Output  Address  of  the  initial  T. 

A.  Any  output  option  left  blank  will  not  be  output. 

B.  The  program  will  always  print  out  the  program 
name  and  the  number  of  iterations  actually  done . 

C.  The  program  will  print  out  the  largest  change  in 
V,  unless  option  9B  is  used. 

D.  All  data  printed  out  is  to  7  decimal  places. 
Program  will  stop  at  whichever  criterion  it 


Note  on  parameter  9:      This  parameter  is  a  floating  point  constant  and  there- 
fore must  be  enclosed  in  asterisks,  with  a  decimal  point,  as  in  example: 

Example:   BINORMAMIN  ( CARDS) ( U ) (SEQ  1  /PRINT K )( )( PRINT) ( )*.0001*. 

Store  V  on  SEQ  1,  also  prints  V  and  correlations  between  factors.   On  the  last 
iteration,  no  element  in  V  changed  by  more  than  .0001,  unless  the  maximum  of 
100  iterations  was  reached. 


CENTROID  I  ACTOR  ANALYSIS 


I .   General  Description 

CENTROID  FACTOR  ANALYSIS  computes  a  set  of  f  linearly  independent 
vectors  (factors'1  which  are  mutually  uncorrelated.   Normally,  a  factor 
analysis  decomposes  a  matrix  of  correlations,  R  ,  into  a  set  of  f 
factors.   The  factors  are  arrayed  as  column  vectors  in  the  factor  matrix, 
F,  such  that 


^  =  FF'  +  R(n-f) 


th 
where  Rn_f  is  the  matrix  of  residual  effects.   The  K   factor  is  computed 

by  dividing  the  column  sums  of  R  ,  by  the  square  root  of  the  total  sum  of 

elements  of  Rn_i. 


<hV  Jxz   r,  ,    (k: 


f  i  k  =  Z   r     v  '  / 

11, k   l   r       /    -  rr    '1,3 

13 

Between  each  factor  extraction,  the  variables  in  the  residual  matrix 
are  successively  reflected  until  all  the  column  sums  are  positive. 

For  more  detailed  discussion  see: 

1.  L.  L.  Thurstone,  Multiple  Factor  Analysis,  Chicago, 
University  of  Chicago  Press,  19^7,  pp.  1^9-175- 

2.  Harry  Harmon,  Modern  Factor  Analysis,  Chicago, 
University  of  Chicago  Press,  i960,  pp.  192-215- 

II .   Restrictions 

The  input  matrix  for  the  CENTROID  program  must  not  exceed  the  dimen- 
sions of  190  x  190.   The  input  matrix  is  further  limited  to  being  a  square, 
positive  definite  or  semi-definite,  symmetric  matrix.   Commonly,  correlation, 
covariance,  or  cross-product  matrices  are  used  as  input  data.  Any  attempt 
to  introduce  communality  estimates  (change  the  diagonal  elements)  must  be 
made  before  data  is  passed  to  the  CENTROID  program.   A  set  of  communalities, 
which  are  incorrectly  estimated,  can  make  the  matrix  non-positive  and  could 
conceivably  cause  a  hang-up. 

The  input  data  may  come  from  any  storage  medium  which  conforms  to  SOUPAC. 
Similarly,  the  output  codes  follow  the  established  conventions  and  are  at 
the  option  of  the  user. 

The  input  matrix  may  be  completely  factored  (i.e.,  N  factors  from  a 
N  variable  matrix")  .   However,  factoring  may  be  stopped  by  any  of  three 
criteria: 

1.   The  user  may  specify  the  number  of  factors  to  be  extracted. 
This  criterion  provides  an  upper  limit  beyond  which  factoring 
will  not  be  done.   Consequently,  it  is  advisable  to  put  the 
maximum  value  on  this  limit  in  cases  where  it  is  not  the 
primary  criterion.   (Set  it  equal  to  the  number  of  variables). 
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2.  The  per  cent  of  total  variance  removed  from  the  R  matrix  is 
a  second  limiting  criterion.  This  parameter  also  specifies 
an  upper  limit  to  the  process.  Therefore,  it  should  be  set 
at  100  per  cent  unless  it  is  the  criterion  for  stopping. 

3.  The  last  criterion  is  to  stop  when  the  factor  contribution  falls 
below  1.   The  use  of  this  procedure  is  dictated  by  the  presence 
or  absence  of  its  associated  parameter. 

If  all  three  criteria  are  used  simultaneoulsy,  factoring  will  be  stopped  by 
whatever  criterion  is  met  first. 


III.   Parameters 


Following  the  program  anme  on  the  program  call  card  come  the  parameters 
needed  by  the  program.   The  parameters  must  appear  in  the  order  below: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15- 

2  Output  Address.   SEQUENTIAL  1-15  and/or  PRINT. 

3  Maximum  number  of  factors  to  be  extracted . 
This  must  be  less  than  or  equal  to  the 
order  of  the  input  matrix. 

k  Per  cent  of  total  variance  to  be  removed 

expressed  as  an  integer  between  0  and  100. 

5  The  presence  of  any  number  greater  than  0 
in  this  parameter  indicates  that  factoring 
should  stop  when  the  factor  contribution 
falls  below  unity. 

6  Output  Address  of  Residual  Matrix. 

If  parameters  3  and  k   are  left  blank  then  by  default  option  they  will  be  . 
set  to  maximum  possible  values  and  a  message  will  be  printed. 

Residual  Matrix  must  be  stored  before  it  can  be  printed. 

Example:  Assume  that  you  have  77  variables  and  that  the  correlation  matrix 
is  stored  on  SEQ  1  ,  then  legal  forms  of  CENTROID  call  statement  may  be: 

CENTR0ID(SEQ  l) (PRINT) (77) (100) (l) . 

CENTROID   (SEQ  1)(P(F))(50)(80). 

CENTROID  (SEQ  l)(SEQ  2/P) (20) (100) (l) (SEQ  3/P) . 

CENTROID  (SEQ  l)(SEQ  2/P(F) H±5) (90) 0 (SEQ  3/P(F) ) . 

CENTROID  (SEQ  l) (P) .     In  this  case,  number  of  factors  =  77 

and  per  cent  of  variance  =  100  will  be  assumed  by  default. 


COMMUNALITiT  ESTIMATION 


I.   General  Description 

Five  methods  of  COMMUNALITY  ESTIMATION  are  offered  in  this  program. 
In  each  case  the  estimates  replace  the  diagonal  elements  of  the  matrix. 
They  are  as  follows : 

Code  Number         Method 

1  The  element  of  largest  absolute  magnitude  in 
each  row  replaces  the  diagonal  element  of  the 
row. 

2  The  square  of  the  multiple  R  of  each  variable 
with  all  others  replaces  the  diagonal  entry 

for  that  variable.  (See  Special  Comment  Number  2) 

3  Communalities  produced  from  another  analysis 
and  are  to  be  input  from  cards  or  another 
storage  medium. 

N      2    l/2 

k  For  each  row  (N)    ((z  r±    ^)/N)  7" 

replace  the  diagonal  entry  for  that  row. 
This  is  the  square  root  of  the  average 
square  across  the  row. 

5  For  each  row  (Nj    (r*.)^  -  r*^/^  -  r*^ 

replaces  the  diagonal  entry  for  that  row 
where : 

r*   =  max  ab  s  ( r .  . )  and 
lk  ij 

S.  =  Z  abs  (r.  .),  S  =  Z  abs  (r.  .) 

J  J 

This  method  of  COMMONALITY  ESTIMATION  is  due  to 
Professor  L.  Tucker. 

II.   Restrictions 

Input  is  restricted  to  correlation  matrices  of  order  150  or  less. 

TTT.   Parameters 

The  parameters  for  the  COMMUNALITY  ESTIMATION  program  appear  on  the 
program  call  card.   They  must  follow  the  program  name  in  this  order: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15- 


IV. 


Parameter 
Number 

2 

3 

k 


Special  Comments 
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Use  or  Meaning 

Output  Address.   SEQUENTIAL  1-15  and/or  PRINT. 
Section  Code  Number.   (See  General  Description' 
Input  Address  if  Option  3  is  used. 


(1)  If  the  correlation  matrix  and  communalities  both  are  input  from  cards, 
the  correlation  matrix  precedes  the  communality  estimations.   (See  Code 
Number  3  in  the  General  Description'!  . 

(2)  If  the  input  correlation  matrix  is  singular,  or  very  nearly  so, 
squared  multiple  correlations  computed  by  standard  procedures  may  be  subject 
to  considerable  error,  and  will  usually  exceed  unity  for  several  variables. 
For  this  reason,  the  user  should  be  aware  of  the  characteristics  of  the 
matrix.   The  program  will  check  to  see  that  all  R  are  less  than  or  equal 

to  1.0.   If  this  is  not  the  case,  execution  will  cease  and  a  message  will 
be  printed  stating  that  the  correlation  matrix  is  virtually  singular. 
There  is  an  alternative  procedure  devised  by  Ledyard  Tucker  for  finding 
R  in  singular  matrices.   Information  is  available  in  the  SOUPAC  office. 


V.   Reference 

Harmon,  Harry:   Modern  Factor  Analysis,  Chicago,  University  of  Chicago 
Press,  i960,  pp.  83-90. 


ITERATIVE  FACTOR  ANALYSIS 


I.   General  Description 

A.   Procedural 

This  routine,  upon  option,  provides  one  of  four  iterative  factor- 
ization methods : 

1.  Alpha  factor  analysis  (AFA,  Kaiser,  1962) 

2.  Canonical  factor  analysis  (CFA,  Rao,  1955,  Harris,  1962) 

3-   Stepwise  maximum  likelihood  factor  analysis  (MLFA,  Lawley,  19^0) 
^4.   Iterative  principal  axis  factor  solution  (IPRAX,  Traditional) 

All  four  methods  have  in  common  that  communalities  and  factor  loadings 
are  estimated  simultaneously.   In  three  cases  (AFA,  CFA,  IPRAX)  the 
number  of  factors  decision  can  be  made  beforehand  by  the  user,  or  it 
can  be  left  to  the  program,  in  which  case  appropriate  modifications  of 
Guttman's  lower  bound  criterion  will  be  used. 

The  four  methods  differ  from  each  other  in  theory  with  respect  to  the 
defining  criterion  of  optimization,  and  consequently  they  differ  tech- 
nically with  respect  to  the  matrix  that  is  diagonalized  in  each  case. 

1.   AFA  (Kaiser) 

Optimization  criterion:  maximize  the  alpha- reliabilities 
(Cronbach)  of  the  retained  factors.   If  the  number  of  factors 
decision  is  left  to  the  program,  the  Kaiser  modification  of 
the  Guttman  criterion  will  be  used  and  all  factors  with 
positive  alpha-reliability  will  be  iterated  upon. 

The  diagonalization  is  on  the  matrix  C  in 

C  =  H"1  (R  -  U2)  H"1    so  that  C  =  Q0Q' 

o 
where  R  is  an  nxn  input  matrix  of  covariances,  H  is  a  diagonal 

matrix  of  communalities,  U  =  I  -  H^  is  a  diagonal  matrix  of 

uniquenesses,  Q  is  an  nxm  matrix  of  latent  vectors  corresponding 

to  the  m  largest  latent  roots  in  Q   which  are  used  to  recompute 

new  estimates  of  H2  through  F  =  H  Q  0V2.   An  initial  set  of  H2 

is  provided  by  I  -  (diag(R"-'-) )"-'-  which  is  equivalent  to  the 

squared  multiple  correlations  of  R  if  R  itself  is  a  correlation 

matrix. 

Invar iance  under  scaling :  Kaiser  has  shown  that  the  resulting 
factors  will  be  invariant  under  scaling,  i.e.,  if  a  covariance 
matrix  R  gives  rise  to  a  factor  matrix  F  then  the  covariance  matrix 
SRS  will  give  rise  to  a  factor  matrix  SF  (S  diagonal). 

Behavior  of  latent  roots,  alpah  reliabilities:   the  n-m  rejected 
roots  of  C  add  to  zero  at  each  state  (i.e.,  C  is  non-Gramian), 
the  m  accepted  roots  are  simple  functions  of  the  alpha-reliabilities 
of  the  retained  factors  in  F.   These  reliabilities  will  be  output 
by  this  sub -program. 
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2.   CFA  (Rao,  Harris) 

Optimization  criterion:   maximize  the  correlations  between  m 
linear  combination  of  the  common  parts  of  the  variables  with  m 
linear  factors  that  are  canonically  correlated  (Hotelling)  with 
the  variables  in  the  common  factor  space.   If  the  number  of 
factors  decision  is  left  to  the  program,  the  Harris  modification 
of  the  Guttman  criterion  will  be  used,  leading  to  a  Gramian 
R-U  of  minimum  rank . 

The  diagonalization  is  on  the  matrix 

C  =  U-1  (R  -  U2)  IT1    so  that  C  =  Q9Q' 

where  F  -  UQ.W-V   is  used  to  recompute  new  estimates  for  U2, 
retaining  the  m  largest  roots  of  C  in  9.      The  notation  is  the 
same  as  in  section  1  (AFA) .   An  initial  set  of  U2  is  provided  by 
[diag  (R-1)]-1. 

Invariance  under  scaling:   the  resulting  factors  are  again  invariant 
under  scaling  as  defined  in  section  1  (AFA"1  . 

Behavior  of  latent  roots:   Chi- square  criterion:   Rao  has  shown 
that  the  n-m  rejected  roots  approach  unit  at  convergence.   For 
exact  rank  m  data  they  will  be  "exactly"  unity  within  the  tol- 
erance of  the  convergence  criterion  ETA  (see  section  III  -  B) . 
For  data  containing  random  error  their  departure,  from  unity 
provides  a  likelihood  ratio  test  for  the  hypothesis  that  the 
population  matrix  P-V2 =  GG'  ,    where  P,  V,  G  are  population 
parameters  corresponding  to  R,  U,  F  in  the  sample,  is  rank  m 
or  less.   A  criterion  for  this  test  is  computed  by  this  sub- 
program which  can  be  compared  with  two  chi-square  approximations 
which  are  also  output  by  this  sub-program.   Note  that  such  a 
chi-square  test  is  valid  only  if  the  iterative  process  has  indeed 
converged,  as  indicated  by  the  maximal  discrepancy  between  trial 
vectors  which  is  printed  out  for  that  purpose. 

3-   MLFA  (Lawley) 

The  CFA  variant  of  the  program  can  be  used  for  a  step-wise  maximum 
likelihood  factorization  in  the  Lawley-Rao  sense. 

Optimisation  criterion:   maximize  the  likelihood  function  corre- 
sponding to  the  multivariate  normal  distribution  with  covariance 
matrix  parameters  P=GG'  +  V2  (as  defined  in  section  2,  CFA),  given 
the  sample  matrix  R,  under  choice  of  G  and  V  and  observing  the 
side-conditions  that  P  - V^  is  Gramian  and  V  diagonal  with  0  <  Vj_  <  1  . 

The  diagonali zation  is  the  same  as  in  CFA,  section  2,    hence,  the 
resulting  factors  are  again  invariant  under  scaling  as  defined  in 
section  1  (AFA) . 
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In  contrast  to  CFA,  however,  the  number  of  factors  decision  is 
made  on  statistical  ground.   The  user  would  start  with  a  reason- 
able guess  for  m  (preferably  m<n/2  to  ensure  positive  degrees 
of  freedom  for  the  chi-square  test,  which  otherwise  will  be  by- 
passed) .   After  convergence  has  been  obtained,  the  user  would 
insepct  the  chi-squared  statistic.   If  the  statistic  is  below  the 
table  values  of  the  chosen  porbability  level  (.05  or  .01),  then 
the  hypothesis  can  be  accepted  at  this  level  with  corresponding 
risk  and  the  user  has  the  option  to  reduce  m  for  a  second  run, 
etc.   On  the  other  hand,  if  the  adjusted  statistic  exceeds  the 
table  value,  then  m  must  be  raised  until  the  adjusted  statistic 
warrants  acceptance  of  the  hypotheses. 

Within  the  package  the  user  is  free  to  re-enter  the  routine 
repeatedly  with  sequentially  de-  or  increasing  m  specified  on 
the  call  card.   Since  the  test  assumes  convergence,  it  is  per- 
tinent that  the  number  of  iterations  be  allowed  large  enough 
for  convergence  to  occur  within  the  chosen  tolerance  bound  ETA 
(see  section  III  -  B) . 

k.      IPRAX  (traditional1' 

Optimization  criterion :   none 

The  diagonalization  is  on  the  matirx 

2 
C  =  R  -  U     so  that  C  =  Q9Q,' 

where  F  =  Q01/2  is  used  to  recompute  H2  =  I  -  U2,  retaining 
the  m  largest  roots  in  6.      The  notation  is  the  same  as  in 
section  1  (AFA) .   An  initial  set  of  H2  is  provided  by  the 
identity  matrix.   If  the  number  of  factors  decision  is  left 
to  the  program,  the  unmodified  Guttman  criterion  will  be  used, 
i.e.,  all  factors  corresponding  to  roots  of  the  input  matrix 
R  which  exceed  unity  will  be  retained. 

Invariance  under  scaling:   as  defined  in  section  1  (AFA)  is 

not  obtained  by  this  method. 

The  behavior  of  the  latent  roots  is  not  known  at  present.   Wo 
statistical  or  other  significance  can  be  attached  to  the  m 
largest  or  n-m  smallest  root  of  C. 

5-   Both  covariance  matrices  and  correlations  matrices  are  accept- 
able as  input.   If  covariances  are  used  the  tenth  parameter 
should  be  1.   In  this  case  the  covariance  matrix  is  scaled  into 
a  correlation  matrix,  and  all  computations,  in  particular  the 
number  of  factors  decision,  are  based  on  this  correlation  matrix. 
At  the  final  stage  the  factors  are  scaled  back  so  as  to  account 
for  the  covariance  matrix  which  was  input.   The  matrix  of  residuals 
is  computed  in  the  metric  of  the  covariances. 
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II.   Restrictions 

Input  is  restricted  to  matrices  of  order  100  x  100  or  less.   Up  to 
50  factors  can  be  handled  by  this  program.   If  the  number  of  factors 
decision  is  left  to  the  program  and  more  than  50  factors  are  estimated, 
an  appropriate  message  will  be  printed  out  and  control  will  be  returned 
to  the  system. 

III.   Parameters 

The  program  name  is  ITERATIVE  FACTOR  ANALYSIS.   After  the  name  on 
the  call  card  the  parameters  must  appear  in  the  following  order: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address  of  data  matrix.   CARDS  or 

SEQUENTIAL  1-15. 

2  Output  Address  of  data  matrix.   SEQUENTIAL  1-15 
and/or  PRINT. 

3  Output  Address  for  principal  axis  factors. 
SEQUENTIAL  1-15  and/or  PRINT. 

h  Output  Address  of  residual  matrix.   SEQUENTIAL  1-15 

and/or  PRINT. 

5  Option  code    0  if  IPRAX 

1  if  ALPHA 

2  if  CANONICAL 

3  if  STEP-WISE  MAXIMUM  LIKELIHOOD 
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Parameter 
Number  Use  or  Meaning 

6  Maximum  number  of  cycles  to  be  executed. 
If  left  blank,  50  cycles  will  be  used  as 
upper  limit . 

7  Number  of  factors  to  be  extracted.   If  left 
blank,  all  factors  with  roots  exceeding  unity 
will  be  retained. 

8  Exponent  of  convergence,  n,  where  tolerance 
ETA  =  10"n.   If  left  blank  n  =  3  or  ETA  =  10"3. 
If  all  goes  well,  the  program  will  stop  as  soon 
as  either  one  of  the  stopping  criteria  is  met. 
Error  stops,  if  they  occur,  are  labelled 
accordingly. 

9  Sample  size  (for  CFA  only).   If  left  blank, 
the  chi-square  computations  are  by-passed. 

If  specified,  chi-square  is  computed  with  the 
sampel  size. 

10  1  if  input  matrix  was  a  covariance  matrix. 


A.  Output  common  to  all  four  sub -programs 

1.  Matrix  output  within  system  conventions: 

a.  R  (input  covariance  matrix ) 

b.  F  (factor  matrix) 

c.  R-FF'  (residual  matrix) 

2.  Vector  output,  print  only: 

a.  communality  vector  (last  iteration) 

b.  vector  of  latent  roots  of  C  (last  iteration) 

3-   Constant,  print  only: 

a.  number  of  iterations  completed 

b.  largest  discrepancy  between  trial  vectors 
(H^,  U  ,  H"-1-,  depending  on  sub-program) 

c.  root  mean  square  of  off -diagonal  residual  matrix 

d.  per  cent  of  variance  removed 

B.  Additional  output  specific  to  sub-programs 

AFA:   The  alpha-reliabilities  of  the  m  retained  factors 

CFA:   chi-square  statistic,  chi-square  appriximations  (Wilson, 
Hilferty)  for  p  =  .05  and  p  =  .01,  for  comparison  with 
statistic.   Degrees  of  freedom. 
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IV.   Special  Comments 

The  accuracy  should  be  approximately  6  digits  in  computations, 
possibly  somewhat  lower  for  a  very  large  number  of  iterations.   The 
effective  accuracy  depends  on  the  chosen  tolerance  ETA  and  the  actual 
convergence  as  indicated  by  the  largest  discrepancy  between  trial 
vectors.   The  chi-square  approximations  are  within  2  x  10"^  for  more 
than  8  degrees  of  freedom. 


JACOBI 


General  Description 

This  program  calculates  eigenvalues  and  eigenvectors  of  a  square, 
symmetric  matrix,  using  the  JACOBI  rotating  technique.   This  program  is 
limited  to  matrices  of  110  rows  and  110  columns.   The  user  should  realize 
that  this  technique  is  extremely  slow  on  large  matrices,  while  the  program 
takes  no  longer  than  PRINCIPAL  AXIS  FACTOR  ANALYSIS  for  small  matrices 
(up  to  20  x  20). 


II, 


Parameters 


Parameter 
Number 


Use  or  Meaning 


Input  Address  of  correlation  matrix, 
or  SEQUENTIAL  1-15. 


CARDS 


2 

3 
k 


Output  Address  of  eigenvectors. 

Output  Address  of  principal  axis  factor. 

Output  Address  of  eigenvalues,  stored  as  a  row 
vector.   PRINT  is  not  valid.   The  eigenvalues 
are  always  printed. 

Number  of  eigenvectors  (or  factors)  to  be  out- 
put. 


Special  Comments 

The  eigenvalues  are  stored  in  descending  algebraic  order  (from  largest 
to  smallest),  and  the  eigenvectors  and  factors  are  placed  in  the  same  order. 


IV.   Reference 


Ralston,  A.  and  Wilf ,  H.  S. :   Mathematical  Methods  for  Digital  Computers, 
John  Wiley  and  Sons,  New  York,  196i+. 


OBLIMAX  rotation 


I.   General  Description 

The  OBLIMAX  OBLIQUE  ROTATION  transforms  a  set  of  factors  F  to  a  new 
set  V  such  that  the  factor  kurtosis, 

k 

ZZ  V. .                        •    -i   o 
K   _     ij  i  =  1,  2, ,  n 

,      2,2  j  =  1,  2, ,  k 

(ZE  v. .  )  J     '   '         ' 

ID 

is  at  a  maximum. 

The  purpose  of  the  transformation  is  to  attempt  to  rotate  analytically 
to  a  position  similar  to  that  obtained  by  applying  Thurstone's  rules  for 
simple  structure.   (See  Multiple  Factor  Analysis,  L.  L.  Thurstone,  19^7j 
pp.  319-^10.)   However,  Thurstone's  rules  and  the  oblimax  procedure  are 
not  the  same,  and  it  is  too  much  to  expect  that  results  obtained  from 
both  procedures  will  agree  exactly. 

It  would  be  desirable  to  solve  directly  for  the  transformation  matrix 
T,  but  unfortunately  no  solution  to  this  problem  has  been  found.   Instead 
oblimax  takes  two  vectors  at  a  time,  solves  for  the  rotational  angles, 
transforms  the  vectors,  and  then  selects  another  pair  until  all  k(k-l) 
pairs  have  been  rotated.   This  process  is  repeated  iteratively  until  the 
criterion  K  no  longer  increases.   Despite  the  pairwise  procedure,  K  is 
well  behaved,  and  in  general,  approaches  steadily  to  a  minimum. 

For  any  pair  of  factors,  a  and  b,  the  solution  proceeds  as  follows: 


K 


ZZ(a.  cos  0.  +  b.  sin  0.)        EZ(a.  +  b.  X.) 

i     r3  i     r3  =  i    i   J 

[zz(a.  cos  0.  +  b.  sin  0.)2]2    [zz(a.  +  b.  X.)2]2 


The  derivative  of  Kaj-,  is  set  equal  to  zero,  resulting  in  a  quartic 
equation  in  X  which  is  tan  0.   Two  solutions  for  X  will  be  maxima,  and, 
each  X  is  found,  the  sign  of  the  second  derivation  is  inspected  to  select 
maxima.   A  small  transform  (2  x  2)  is  created,  but  before  post-multiplication 
is  performed,  the  transforms  must  be  adjusted  so  that  when  it  becomes  a  part 
of  T,  t  ,  and  t^,  will  remain  normalized.   In  this  way,  both  B  and  T  are 
developed  pair  by  pair. 

For  references  see: 

Pinzka,  C,  and  Saunders,  D.  R.,  "Analytic  Rotation  to 
Simple  Structure  II:   Extension  to  an  Oblique  Solution." 
Research  Bulletin  RB-5U- 31 •   Princeton,  N.  J.:   Educational 
Testing  Service,  195^- 

II.   Alternate  Use 

If  the  user  already  has  a  transformation  matrix,  he  may  use  it  to  com- 
pute Vrs  et . al  by  giving  the  input  address  of  T  in  parameter  8;  in  this  case, 
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the  oblimax  calculation  of  T  will  be  skipped.   If  both  F  and  T  are  to  be 
input  from  cards,  then  the  data  deck  of  F  should  precede  the  deck  of  T. 


III.   Output 

The  OBLIMAX  program  always  prints  the  following  (unless  parameter  8 
is  used) : 

1.  The  value  of  K  for  each  pass 

2.  The  iteration  time 

It  outputs  the  following  on  demand  (See  Parameters) : 

1.  Transformation  matrix  T 

2.  Reference  vector  structure,  Vrs  =  FT 

3-   Reference  vector  correlations,  Crs  =  T'T 

k.      Diagonal  of  D  and  of  D"1 

where  D  is  the  diagonal  matrix  of  the  reciprocal 
square  root  of  the  diagonal  elements  of  C rs 

5.  Primary  factor  pattern,  Vf.p  =  FTD   =  VrsD 

6.  Primary  factor  correlations,  C-^p  =  DC  ~-4) 
All  data  is  printed  out  to  seven  decimal  places. 

IV.   Restrictions 

The  number  of  variables  plus  the  number  of  factors  must  be  no  more 
than  300. 

V.   Note  on  Parameter  2 

If  row  normalization  is  specified,  the  normalization  constants  will  be 
preserved  and  the  rows  will  be  re scaled  to  proper  length  after  rotation  and 
prior  to  output. 

VI .   Parameters 

The  program  name,  OBLIMAX,  appears  first  on  the  program  call  card 
and  is  followed  by  the  following  parameters.   Any  output  option  (except 
parameter  9)  may  be  SEQUENTIAL  1-15  and/or  PRINT;  it  may  be  left  blank 
if  not  desired. 

Parameter 
Number  Use  or  Meaning 

1  Input  Address  of  F.   CARDS  or  SEQUENTIAL  1-15- 
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Parameter 
Number  Use  or  Meaning 

2  If  rows  are  to  be  normalized  before  rotation, 
punch  a  1;  otherwise  a  zero  or  leave  blank. 
(See  Note  on  Parameter  2) . 

3  Output  Address  of  T. 
h                                     Output  Address  of  Vrs 

5  Output  Address  of  C 

6  Output  Address  of  Vfp 

7  Output  Address  of  C„ 

8  Input  Address  of  T  (See  Alternate  Use) . 

9  D- value  and  Inverse  of  D.   PRINT  only. 


PRINCIPAL  AXIS  FACTOR  ANALYSIS 
(Eigenvalues  and  Vectors) 


General  Description 

The  purpose  of  PRINCIPAL  AXIS  FACTOR  ANALYSIS  is  to  determine  a 
factor  matrix,  F,  given  a  Gramian  matrix,  R,  of  order  n  such  taht 

F(n,f)F'(f,n)  =  R*(n,n) 
where  R*  is  an  approximation  to  R. 

The  column  vectors  of  F  are  defined  as  the  factors  (measures  of 
dimensionality)  of  the  original  matrix,  R.   The  solution  for  the  matrix 
F  is  the  classical  eigen  problem.   Consequently,  the  computations  are 
done  by  an  eigenvalue  subroutine.   Before  output  the  eigenvectors,  Ej, 
are  scaled  as  follows : 

F(I,J)  =  E(l,j)*LAMBDA(j)**.5 

for        I  =  1, ....,n.    J  =  1, ....,n. 

to  generate  the  principal  axis  factors,  F.  (See  Introduction  on  Factor  Analysis] 

For  a  more  detailed  discussion  see: 

Harry  Harmon,  Modern  Factor  Analysis,  Chicago,  University  of 
Chicago  Press,  i960,  pp.  15U-191. 

Restrictions 

The  input  matrix  for  the  PRINCIPAL  AXIS  program  must  not  exceed  the 
dimensions  of  190  x  190  double  precision.    The  input  matrix  is  further 
limited  to  being  a  square,  symmetric  matrix.   Generally  correlation, 
covariance,  or  cross-product  matrices  are  used  as  input  data.   It  should 
be  noted  that  matrices  with  large  numerical  entries  such  as  cross-products 
may  generate  output  values  which  cannot  be  printed  under  the  fixed  out- 
put formats.   The  probability  of  this  happening  is  very  small.  Any  com- 
munality  estimation  (i.e.,  change  in  the  diagonal  entries  of  R)  must  be 
done  prior  to  the  input  of  R,  to  the  PRINCIPAL  AXIS  program. 

If  the  communality  estimates  are  used,  the  user  should  check  the 
resulting  roots  for  negative  numbers.   If  any  exist  the  associated  vector 
is  meaningless. 

The  input  data  may  come  from  any  source  conforming  to  SOUPAC.   Similarly, 
the  output  codes  follow  the  established  conventions  and  are  specified  at  the 
option  of  the  user. 

The  R  matrix  may  be  completely  factored  (i.e.,  N  factors  from  N  vari- 
able matrix) .  However,  there  are  three  criteria  which  may  be  used  to  stop 
the  factoring : 
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1.  The  user  may  specify  the  number  of  factors  to  be  extracted. 
This  criterion  provides  an  upper  limit  beyond  which  factoring 
will  not  proceed.   Therefore,  it  is  necessary  to  put  the 
maximum  value  in  this  limit  in  cases  where  it  is  not  the 
primary  criterion. 

2.  The  percentage  of  total  variance  removed  from  R  is  the 
second  limiting  criterion.   This  parameter  also  specifies 
an  upper  limit  to  the  process.   Therefore,  it  should  be  set 
at  100  per  cent  unless  it  is  the  criterion  for  stopping. 

3-   The  last  criterion  is  to  stop  when  the  factor  contribution 
(eigenvalue  or  root)  falls  below  1.   The  use  of  this  pro- 
cedure is  dictated  by  the  presence  of  its  parameter. 

If  all  three  criteria  are  employed  simultaneously,  factoring  is  stopped 
by  whichever  criterion  is  first  met. 


III.   Parameters 


The  parameters  for  the  PRINCIPAL  AXIS  program  appear  on  the  program 
call  card.   They  must  follow  the  program  name  in  this  order: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15- 

2  Output  Adddres.   SEQUENTIAL  1-15  and/or  PRINT. 

3  Maximum  number  of  factors  to  be  extracted. 
This  must  be  less  than  or  equal  to  the 
order  of  the  input  matrix. 

h  The  percentage  of  total  variance  to  be 

removed  expressed  as  an  integer  between 
0  and  100. 

5  The  presence  of  a  number  greater  than  0 
indicates  the  factoring  should  stop  when 
the  eigenvalues  (roots)  fall  below  unity. 

6  Output  Address  of  Eigenvectors 


The  address  of  where  eigenvalues  are  to  be 
placed  as  a  row  vector  if  they  must  be 
stored  for  further  use.   If  values  need 
not  be  saved,  leave  parameter  blank.   PRINT 
is  not  valid. 
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Parameter 
Number 


Use  or  Meaning 


Mode  of  sorting  eigenvalues  and  associated 
vectors.   The  codes  are  as  follows: 


Code 


Meaning 


0 

1 

2 

10 

11 
12 


Descending  algebraic  order 
Descending  absolute  values 
Order  of  extraction 
Ascending  algebraic  order 
(the  k  smallest  root) 
Ascending  absolute  values 
Reverse  order  of  extraction 


Leaving  any  parameter  blank  is  the  same  as  specifying  zero.   Con- 
sequently, options  which  are  not  needed  can  be  avoided  by  leaving  the 
associated  parameter  blank. 


PROCRUSTES  (Oblique  Case) 

General  Description 

This  program  offers  3  options: 

1.   (Oblique)  Procrustes.   Given  A,  B,  the  program  solves 

AT*  =  B  +  E 

for  T*  in  a  least  square  sense  (i.e.,  minimizing  tr[E'E]), 
so  that 

T*  =  (A'Al^A'B, 

and  then  normalized  T*  by  columns  to  yield  T  =  T*D  so  that 

diag  (T'T)  =  I.   It  then  computes  AT  which,  in  a  loose  sense, 

can  be  regarded  as  a  least  squares  fit  to  A  to  B  under  the 

restriction  that  diag  (T'T)  =  I.   It  also  provided 

Cf  =  Dn(T'T)_1  where  Dn  is  a  normalized  diagonal  matrix 

so  that  diag  (Cf)  =  I.   If  D  gave  the  cosines  between  tests 

Cf  will  give  the  factor  intercorrelations.   A  has  to  be  a 

full  column  rank. 


Dwyer  Extension  Analysis.   Given  F  =  Rtc,  a  centroid  or 
equivalent  matrix  of  cosines  between  tests  t  and  uncorrelated 
factors  c,  and  L  =  Rcn,  a  matrix  of  cosines  between  uncorre- 
lated factors  c  and  uncorrelated  reference  vectors  n,  this 
program  computes 

Q  =  Ttn  =  F(F'F)~1L 

which  is  used  as  a  post-multiplier  on  some  correlations 
matrix  Ret  between  the  tests  t  x  in  F  and  some  set  of 
extension  variables  e  given  Ren,  the  cosines  of  the  extension 
variables  e  with  reference  n,  to  the  extent  that  the  former 
can  be  projected  into  the  sub -space  spanned  by  the  latter. 
This  multiplication 

Ren  =  Ret  Ttn 
can  be  performed  by  use  of  the  MATRIX  program. 
Left  Inverse  (transposed).   Given  A,  the  program  will  return 

Q  =  A(A'A)"1 

provided  A  was  a  full  column  rank.   Q  is  the  transposed  left 
inverse  of  A  which  can  be  used  in  lease  squares  application. 


IP 


Restrictions 


Input  is  restricted  to  matrices  (A,  B,  or  F)  of  order  190  x  50  or  less, 
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III.   Parameters 


The  parameters  for  this  program  appear  on  the  program  call  card. 
They  must  follow  the  program  name  in  this  order: 


Parameter 
Number 

Use  or  Meaning 

Procrustes 

LEA 

LINV 

1 

Input  Address 
CARDS  or 
SEQUENTIAL  1-15- 

A 

F 

A 

2 

Input  Address 
CARDS  or 
SEQUENTIAL  1-15- 

B 

L 

3 

-  Output  Address 
SEQUENTIAL  1-15 
and/or  PRINT. 

A 

F 

A 

Output  Address        B         L 
SEQUENTIAL  1-15 
and/or  PRINT. 

Output  Address        T         Q     A(A'A)' 
SEQUENTIAL  1-15 
and/or  PRINT. 

Output  Address        C~ 
SEQUENTIAL  1-15 
and/or  PRINT. 

Output  Address        AT 
SEQUENTIAL  1-15 
and/or  PRINT. 

Output  Address        E 
SEQUENTIAL  1-15 
and/or  PRINT. 

Choice  Address        0         12 


SQUARE  ROOT  FACTOR  ANALYSIS 


I.   General  Description 

The  SQUARE  ROOT  method  of  factor  analysis,  also  called  the  Diagonal 
Method,  by  L.  L.  Thurstone,  decomposes  a  correlation  matrix  R  (or  any 
other  positive  semi-definite  or  definite  symmetric  matrix)  such  that 

R  =  F  F'  +  R(k+1) 

where  R(k+l)  is  the  residual  matrix  after  extracting  k  factors.   Of  course 
if  all  n  factors  are  extracted,  the  residual  matrix  becomes  a  null  matrix. 

The  factor  fj  is  computed  by  dividing  each  element  of  the  j^h  column 
of  R  by  its  diagonal  square  root : 


f^  :=  ri-/v/"r 


ij 


33 


(i  =  1,2, 


The  matrix  A  =  fj'f"  is  then  subtracted  from  R  and  the  operation  repeated 
on  the  residual  matrix. 

Prior  to  the  widespread  use  of  high  speed  computers,  the  SQUARE  ROOT 
method  was  sometimes  used  as  a  substitute  for  the  PRINCIPAL  AXIS  or  CENTROID 
method  due  to  the  relative  ease  of  computing  a  square  root  factor.  When 
used  in  this  way,  one  seeks  to  extract  the  maximum  variance  for  each  factor, 
in  which  case  Parameter  h   should  be  blank.   The  program  then  selects 
the  next  column  on  the  basis  of  the  largest  residual  column  sum  of  squares. 

Nowadays,  however,  the  SQUARE  ROOT  method  is  more  likely  to  be  used 
for  special  purposes.  By  selecting  successive  pivot  variables,  the  user 
retains  control  over  the  factoring.  Factors  are  passed  directly  through 
the  test  variables  and  the  effect  of  these  variables  is  removed  from  the 
matrix.  The  communalities  or  row  sums  of  squares  are  the  squared  multiple 
correlations  of  the  remaining  variables  with  the  pivot  variables. 


The  pivots  selected  may  be  any  columns  in  the  matrix.   Let  us  assume, 
however,  that  these  are  adjacent  to  each  other  in  the  upper  right  hand 
corner  of  the  partitioned  matrix  below: 


R  = 


RPP  Rps 

R       Poo 

sp  ss 


Then  the  effect  of  pivoting  successively  on  the  variables  in  the  upper 
right  hand  corner  is  shown  by  the  residual  matrix  as  follows: 

R(P+1)  =  R-FrF  ' 


0 


R   ^R   R   "J-R 
ss  sp  pp   ps 


II.   Restrictions 


) 


A. 


Dimension 


Maximum  size  of  the  R  matrix  is  190  variables. 
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B.   Special  Conditions 

1.  The  researcher  may  specify  the  extraction  of  any  number  of 
factors  up  to  dimension  of  R. 

2.  The  researcher  may  specify  the  diagonal  element  to  be  used 

in  the  extraction  of  each  factor,  or  he  may  have  the  procedure 
remove  the  maximum  variance  each  time. 

3.  The  residual  matrix  may  be  saved  if  the  researcher  desires. 


III.   Parameters 


Following  the  program  name  the  parameters  must  appear  in  the  following 
order  on  the  program  call  card: 


Parameter 
Number 


2 
3 
h 


Use  or  Meaning 

Input  Address.   CARDS  or  SEQUENTIAL  1-15- 
(Correlation  or  positive  definite  or  semi- 
definite  matrix). 

Output  Address.   SEQUENTIAL  1-15  and/or  PRINT. 

Number  of  factors  extracted. 

Input  Address  for  diagonal  elements. 
CARDS  or  SEQUENTIAL  1-15- 

Output  Address  for  residual  matrix. 
SEQUENTIAL  1-15  and/or  PRINT. 


IV.   Special  Comments 

If  the  diagonal  element  for  each  factor  is  specified,  and  if  both 
input  addresses  are  cards,  then  data  precedes  diagonal  specification. 


V.   Example 

Assume  you  have  a  20  x  20  correlation  matrix  on  cards  and  that  you 
want  to  extract  15  factors ;  also  you  are  reading  the  pivot  column  from 
cards.   The  program  would  be  set  up  as  follows: 
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/*ID 

//  EXEC   SOUPAC 
//SYS IN  DD  * 
SQU(C)(P)(15)(C)(P). 
END  SOUPAC 
DATA(20)(8F9-T) 

data 


END  # 
DATA(15)(15I2) 

END  # 

/* 


diagonal  specification  card(s) 


THREE-MODE  FACTOR  ANALYSIS 


I.  General  Description 

A.  GENERAL  COMMENTS 

This  program  provides  a  factor  analytic  solution  for  a  3-dimensional 
i  by  j  by  k  data  matrix.   The  computational  procedures  employed  are 
those  presented  in  Method  III  of  Tucker's  article  (reference  below). 
This  method  provides  most  efficient  analysis  when  one  of  the  modes, 
usually  individuals  is  quite  large,  though  this  is  certainly  not  a 
necessary  condition. 

B.  THE  THEORETICAL  MODEL 

Here,  d,  j,  and  k  represent  the  modes  of  classification  which 
are  directly  related  to  the  observation  of  the  data;  i,  j,  and  k  are 
thus  termed  observational  modes.   An  example  would  be  the  observation 
of  scores  for  i  individuals  on  j  tests  given  under  k  different  conditions. 

Through  factoring,  we  wish  to  reduce  the  observational  modes  i,j,  and 
k  to  corresponding  derivational  modes  m,  p,  and  q.  Each  of  the  deriva- 
tional modes  can  be  thought  of  as  a  set  of  factors  in  the  domain  of  the 
corresponding  observational  mode.  The  core  matrix  G  then  serves  to 
describe  the  relationships  among  the  derivational  modes. 

The  fundamental  three-mode  factor  analysis  model  is  represented  by 
the  equation : 

Z-  ZEE 

x-  .,    =  a.   b.   a.   g     , 

ljk       m  q  p   m  jp  kq  mpq 

where  x.  .,  is  an  approximation  to  the  observed  score  x.  ...  ;  a.  ,  b.  , 
ljk       **  ljk    1m'   jp' 

and  c,   are  entries  in  two-mode  matrices  .A  ,  .B  ,  and  ,  C  describing 
kq  l  m  3  p7     k  q 

the  elements  in  the  observational  modes  i,  j,  and  k  in  terms  of  the 

dimensions  in  the  derivational  modes  m,  p,  and  q  respectively;  the 

coefficients  g    are  entries  in  a  three-dimensional  matrix  G  and 

represent  the  measures  of  the  phenomenon  being  observed  for  each 

combination  of  the  dimensions  of  the  derivational  modes. 

In  matrix  form,  the  model  could  be  represented  as: 

.X..  =  .A  G,  A   B.  %     C.  )   , 
1  jk  l  m   (pq) vp  j    qk' 

where  X  indicates  a  Kronecker  product.   Matrices  A,  B,  and  C  are 
factor  solutions  for  modes  i,  j,  and  k  respectively  which  serve  to 
transform  the  core  matrix  G  of  the  3  derivational  modes  to  the  matrix 
X  representing  the  3  observational  modes. 
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C.   INPUT  DATA 


The  input  data  must  be  a  Gramian  matrix,  usually  correlations, 
covariances,  or  cross-products,  in  the  form  .,R  ,  where  i  is  assumed  to 
be  the  largest  mode  and  jk  represents  the  combination  mode,   with 
mode  k  nested  within  mode  j. 

II.   Output 

The  output  consists  of  the  following: 

(l)   the  .P.  and  CL  matrices  which  represent  the  correlations,  covariances, 
or  cross  products  within  modes  j  and  k  respectively; 

(2^   the  eigenvalues  and  eigenvectors  of   R  ; 

Jk  jk 

(3)   the  eigenvalues  and  eigenvectors  of  .P.; 
(h^      the  eigenvalues  and  eigenvectors  of  ,0- 

(5 '   the  core  matrix   G  ,  where  m,  p,  and  q  represent  the  derivational 
modes  corresponding  to  observational  modes  i,  j,  and  k  respectively. 

All  of  this  is  printed  out  and  may  also  be  stored  on  sequential  storage 
devices.   The  user  must  specify  the  number  of  factors  to  be  extracted 
from  each  of  the  three  modes.   This  procedure  is  employed  since  the  use 
of  other  factor- stopping  criteria  (e.g.  per-cent  of  variance  accounted 
for,  or  eigenvalues  below  unity^  could  easily  lead  to  the  computation  of 
a  great  many  useless  factors  as  well  as  a  very  large  and  unmanageable 
core  matrix.   The  user  is  also  cautioned  against  specifying  large  numbers 
of  factors  since  this  would  cause  substantial  increases  in  time  required 
to  factor  the  various  modes  and  compute  the  core  matrix. 

It  is  strongly  recommended  that  the  user  be  familiar  with  the  Tucker 
article  and  with  factor  analysis  in  general  before  attempting  to  use 
this  program. 

III.   Parameters 

The  program  mnemonic  (T-Ml  appears  first  on  the  program  card  and  is 
followed  by  the  following  16  parameters,  the  first  8  of  which  are  required. 
Output  addresses  are  optional.   All  output  will  be  printed. 

Parameter 
Number  Use  or  Meaning 

1  Input  address  of  R  matrix  in  the  form  „  R., 

Dk  jk 

(Cards  or  S1-S15). 

2  Number  of  subjects  or  elements  in  mode  i. 

3  Number  of  variables  in  mode  j. 
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Parameter 
Number 

It 

5 
6 

7 
8 

9 
10 
11 
12 
13 
11+ 
15 
16 
IT 


Use  or  Meaning 

Number  of  variables  in  mode  k. 

Number  of  factors  to  be  removed  from  matrix  R  (mode  i), 

Number  of  factors  to  be  removed  from  matrix  P  (mode  j ) , 

Number  of  factors  to  be  removed  from  matrix  Q  (mode  k) 

(1)  If  input  matrix  is  cross-products 

(2)  If  input  matrix  is  correlations. 

(3)  If  input  matrix  is  covariances. 

Output  address  for  row  vector  of  eigenvalues  of  R. 

Output  address  for  eigenvectors  of  R. 

Output  address  for  matrix  .P  • 

Output  address  for  row  vector  of  eigenvalues  of  P. 

Output  address  for  eigenvectors  of  P. 

Output  address  for  matrix  -.0,' 

Output  address  for  row  vector  of  eigenvalues  of  Q. 

Output  address  for  eigenvectors  of  Q,. 

Output  address  for  core  matrix   G  . 

pq  m 


IV.   Reference 

Tucker,  Ledyard  R.   Some  mathematical  notes  on  three-mode  factor  analysis, 
Psychometrika,  1966,  31,  279-311. 


UNRESTRICTED  MAXIMUM  LIKELIHOOD  FACTOR  ANALYSIS 

Parameter 
Number  Use  or  Meaning 

1  Input  Address  for  correlation  matrix. 

SEQUENTIAL  1-15;  CARDS  are  not  permitted 

2  Output  Address  for  final  unrotated  factor  matrix. 

SEQUENTIAL  1-15-   See  also  Parameter  13- 

3  Input  Address  for  row  vector  of  initial 

estimate  of  uniqueness.   CARDS,  SEQUENTIAL 
1-15  (optional). 

k  Lower  bound  for  number  of  factors. 

5  Upper  bound  for  number  of  factors . 

6  Sample  size  (number  of  observations)  on 

which  correlation  matrix  is  based. 

7  Maximum  number  of  iterations. 

8  Probability  of  chance  occurance,  i.e.,  *1.00*. 

9  1  to  print  input  correlation  matrix  and  partial 

correlation  matrices  after  any  variables  have 
been  removed. 

10  1  to  print  technical  output. 

11  1  to  print  intermediate  results. 

12  1  to  punch  unrotated  factor  matrices. 

13  1  to  apply  a  varimax  rotation  to  all  factor 

matrices.   If  this  parameter  is  used  the  output 
of  parameter  2  will  be  a  rotated  factor  matrix. 

This  program  has  been  taken  directly  from  JSreskog  (1967)  with  his  per- 
mission.  Anyone  interested  in  the  methods  is  referred  to  the  references  listed 
below.   The  program  is  temporarily  limited  to  75  variables  and  30  factors. 
Parameters  1,  k,    5»  6,  7>  and  8  are  required.   Parameter  8  must  be  enclosed 
within  asterisks,  **,  and  must  have  a  punched  decimal  point. 

References : 

Joreskog,  K.  G.  UMLFA  -  a  computer  program  for  unrestricted  maximum  likelihood 
factor  analysis.   Research  Memorandum  66-20.   Princeton,  New  Jersey: 
Educational  Testing  Service.   Revised  Edition,  1967 • 

JSreskog,  K.  G.   Some  contributions  to  maximum  likelihood  factor  analysis. 
Psychometrika,  I967,  32,  3+1+3-U82 . 


VARIMAX  FACTOR  ROTATION 


I.   General  Description 

VARIMAX  ROTATION  is  used  to  redistribute  a  factor  matrix  (principal 
axis,  centroid,  etc.1)  variance  so  that  the  matrix  approches  orthogonal 
simple  structure.   The  varimax  scheme  maximizes  the  following  criterion 
function: 

Z  (hz(a,.   2/h(j)2)2  -  (z(a,    ,2/h(j)2))2) 
s  3   U'S)  j      l3'S) 

where  j  is  the  variable  index  number:   1, .,  n 

s  is  the  factor  index  number:   1, ,  f 

a/  .   \  is  the  factor  loading  of  the  n^h  variable  on  the  s   factor 

hj   is  the  j*  variable  communality 

For  further  discussion  see: 

H.F.  Kaiser,  "Computer  Program  for  Varimax  Rotation  in  Factor 
Analysis",  Educational  and  Psychological  Measurement,  Vol.  XIX, 
Nov.  3,  1959,  pp. 413-^207" 

Cooley  and  Lohnes,  Multivariate  Procedures  for  the  Behavioral 
Sciences,  New  York,  John  Wiley  and  Sons,  Inc.,  1962,  pp.l6l-3- 

II.   Restrictions 

The  input  matrix  for  VARIMAX  ROTATION  must  not  exceed  190  variables 
and  190  factors.   The  number  of  factors  may  be  anything  greater  than  or 
equal  to  2.   Any  factor  matrix  generated  by  a  statistical  system  factor 
analysis  program  is  acceptable  input.   A  matrix  may  also  be  entered  from 
cards. 

III.   Parameters 

The  parameters  for  the  VARIMAX  ROTATION  appear  on  the  program  call 
card.   They  must  follow  the  program  name  in  this  order: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15- 

2  Output  Address.   SEQUENTIAL  1-15  and/or  PRINT. 

3  The  presence  of  a  number  greater  than  0  in  this 
parameter  indicated  the  communalities  should  be 
printed. 

4  0  or  blank  for  normal  VARIMAX.   1  if  raw  VARIMAX 
is  desired. 

*If  CARDS  are  used  the  DATA  card  must  contain  the  number  of  rows  as  well 

as  the  number  of  columns  in  the  input  matrix  (see  User's  Guide  for  details) 
SOUPAC  (Statistically  Oriented  Users  Programming  and  Consulting) 


ECONOMETRICS  PACKAGE 


ECONOMETRIC  REDUCED  FORM  AND  RESIDUAL  ANALYSIS 

I-   General  Description 

The  ECONOMETRIC  REDUCED  FORM  AND  RESIDUAL  ANALYSIS  program  calculates 
the  following : 

(1)  Residuals 

U  =  (YX)  fp7' 

where : 

(YX)  is  the  raw  data  matrix  of  endogenous  and  exogenous  variables 
andjp/is  the  matrix  of  coefficient  estimates. 

(2)  Durbin-Watson  statistic  each  equation(i) 


N 


2 


Z  [Uj(t)  _  Uj(t-1)3 
t=2  ' 

Z      [Ui(t)]2 

t=l 

(3)   Covariance  matrix  for  residuals 


w=   :'p]  (s)ip 


where : 

S  is  the  raw  data  covariance  matrix. 
(U)   Reduced  form  estimates 

*  =  -'(P)-1(r) 

(51   Reduced  form  predicted  values 
y"  =  Yn 

(6)  Reduced  form  residuals 
V  =  Y  -  Y 

(7)  Covariance  matrix  for  reduced  form  residuals 

(P"1)'w(p-1) 


References 
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II, 


Johnston,  J.,  Econometric  Methods,  New  York,  McGraw-Hill  Book 
Company,  Inc.,  i960. 

Goldberger,  Arthur  S.,  Econometric  Theory,  New  York,  John  Wiley 
and  Sons,  Inc.,  196U. 

Restrictions 


Only  those  inputs  used  in  the  calculations  called  for  need  be  given. 
They  must  be  in  the  following  formats : 

(1)  Coefficients: 

The  coefficient  matrix  for  K  equations  with  N  variables,  Nl 
exogenous  and  N2  endogenous,  must  be  a  K  by  N+l  matrix.   Each 
row  corresponds  to  an  equation.   The  first  element  in  each  row 
is  the  constant  term  followed  by  the  coefficients  matrix  (i.e.,  exog- 
enous coefficients   first;  endogenous  coefficients  next).   In 
each  row,  there  must  be  -1  which  corresponds  to  the  endogenous 
variable  that  was  normalized  on. 

(2)  Raw  Data: 

The  data  must  be  arranged  so  that  exogenous  variables  occur 
first  and  endogenous  variables  last.   (The  TRANSFORMATION  program 
may  be  used  to  arrange  data  in  this  way,  if  it  is  not  already 
like  this^ . 

(3)  Raw  Data  Covariance  Matrix: 

The  covariance  matrix  must  have  the  following  form 


sample 
size 


(standard  deviations) 


X  ( 


means , 


Covariance  (exogenous  first,  endogenous  last) 


Care  should  be  taken  to  see  that  an  input  address  is  specified  for  any  data 
needed  in  calculating  the  desired  statistics  and  that  any  intermediate 
statistics  needed  are  stored  (i.e.,  an  output  address  besides  print  is 
specified ).   The  following  list  indicates  which  previous  statistics  are 
needed  in  the  calculation  of  each  statistic. 
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1.  Residuals  -  coefficients  and  raw  data 

2.  Dubrin-Watson  statistic  -  coefficients  and  raw  data 

3.  Covariance  matrix  for  residuals  -  coefficients  and  raw 
data  covariance  matrix 

k.      Reduced  form  coefficients  -  coefficients 

5.  Reduced  form  predicted  values  and  residuals  -  reduced  form 
coefficients  (no  output  address)  and  raw  data 

6.  Covariance  matrix  of  reduced  form  residuals  -  reduced  form 
coefficients  and  covariance  matrix  of  orginal  residuals 

This  program  is  restricted  to  less  than  150  variables. 

III.   Parameters 

The  parameters  appear  on  the  program  card  following  the  name  ECONO- 
METRIC in  the  following  order: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address  for  coefficients.   SEQUENTIAL  1-15. 

2  Input  Address  for  raw  data  covariance  matrix. 
SEQUENTIAL  1-15- 

3  Input  Address  for  raw  data.   SEQUENTIAL  1-15- 

(See  Special  Comments) . 

h  Number  of  exogenous  variables  (total). 

5  Output  Address  for  residuals.   SEQUENTIAL  1-15 
and/or  PRINT. 

6  Output  Address  for  covariance  matrix  of 

residuals.   SEQUENTIAL  1-15  and/or  PRINT. 

7  If  greater  than  0,  reduced  forms  are 
calculated  and  printed. 

8  '  If  greater  than  0,  reduced  form  predicted 

values  and  residuals  are  printed. 

9  If  greater  than  0,  covariance  matrix  for 
reduced  form  residuals  is  printed. 
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IV.   Special  Comments 

If  Parameter  Number  3  is  specified,  the  Durbin-Watson  statistic  will 
be  calculated  and  printed. 


K-CLASS  ESTIMATION 

I.   General  Description 

KIClass  and  K2Class  programs  work  together  to  do  K-class  estimation. 
There  are  three  estimators  which  belong  to  the  K-class.   These  include: 
Ordinary  Least  Squares  (multiple  regression),  Two-stage  Least  Squares, 
and  Limited  Information  Maximum  Likelihood. 

II.   Description  of  KIClass  Output 

The  KIClass  program  calculates  the  "basic  statistics: 


_     EX. 

Mean:   X.  =   „T   where  N  =  Sample  Size 
1     N 


Variance  ZX.X.      

Covariance:    S.  .  =  — |-"-  -     X.X. 

ij       N         1  J 


Standard  Deviation:   s.  =  / S. . 
1       li 


Correlation:   C. .  = 


S.  . 


10     s.s. 

Cross-products  in  matrix  notation:   CP  =  X'X 

KIClass  also  calculates  the  eigenvalue  to  he  used  in  Limited  Information 
Maximum  Likelihood  estimation  in  K2Class. 

Variables  must  be  ordered;  exogenous,  endogenous,  (independent,  dependent), 

III.   Description  .of  K2Class  Output 

K2Class  calculates  estimates  and  associates  statistics  for  Ordinary 
Least  Squares  (OLS),  Limited  Information  Maximum  Likelihood,  and  Two- 
stage  Least  Squares. 

Formulas 


B'  =  estimates  of  the  endogenous  variables 

Y  -  estimates  of  the  exogenous  variables 
y  =  variable  normalized  on 

Y  =  endogenous  variable  in  the  equation 
X  =  exogenous  variables  in  the  equation 
X  =  exogenous  variables  in  the  system 


Estimating  Formulas 
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Y'Y   -   kV'V 
X'Y 


Y'X 
* 

-1 

X'X 
*   * 



Y'y0-  kVyc 

x'.yn 


or 


0   =   A        J 


where  V'V  =   Y'Y  -  Y'X(X'X)    1X'Y 

V'y  =   Y'v   -  Y'X(X'X)-1X'y 
J  o  •  o  v  J  o 


k  determines   the   estimating  technique. 


standard  Error   of  Estimate 


;  =      yX  -  (e  y)  c 


N-J 


where  j  =  Rank  of  A 


-1 


Standard  Error  of  the  Estimated  Coefficients. 

s*.   5  The  square  root  of  the  i   diagonal  element  of  the 
a  A    matrix 


li 


ariance  matrix  of  the  coefficients 


C  =  a   A 
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IV.   Ordinary  Least  Squares  (OLS) 

When  K  =  0,  the  K-class  formula  reduces  to  Ordinary  Least  Squares.   Y 
is  then  assumed  to  he  another  exogenous  variahle  in  the  equation.   When 
ordinary  least  squares  is  specified  the  following  additional  statistics 
are  supplied. 

In  matrix  notation 

2 
0C  -  ly   /  N         Explained  Sum  of  Squares 

R   =  - 


p 
y  'y  -  Zy  /  N       Total  Sum  of  Squares 


o  "  o      o 
Total  Sum  of  Squares 

TSS  =  yQ'  yQ  -  I2y0/N 

Regression  (Explained)  Sum  of  Squares: 

RSS  =  9C  -  £y02/N 

Error  (Unexplained)  Sum  of  Squares: 

ESS  =  TSS  -  RSS  =  y  'y  -  0C 

J  o   o 

Example:   Suppose  that  there  were  two  possihle  models  that  one  wanted  to 
estimate.   In  one  model  variahle  10  is  included  and  in  the  other  variahle 
10  was  not  included. 

K1C(S1)(S2)(0)(  )(  )(1)(1)(  Hi). 

END  P 

K2C(S2)(S3)(SM(  )(10)(2)*0*. 

(10)(1)(1)(2)(3)(M(5K6)(7)(8)(9)(10)(11). 

(9)(l)(D(2)(3)(U)(5)(6)(7)(8)(9)(ll). 

END  P 

V.   Limited  Information  Maximum  Likelihood  (LIML) 

When  K  =  the  smallest  eigenvalue  as  calculated  in  KIClass,  K2Class 
calculates  Limited  Information  Maximum  Likelihood  estimates. 

The  eigenvalue  is  calculated  in  the  following  manner: 

Let 

W  =  Y'Y  -  Y'X   (X'X)_1X'  Y 
*  *         * 

W  =  Y'Y  -  Y'X(X'X)~1  X'Y 
KIClass  transfers  to  K2Class  the  smallest  eigenvalue  of  the  matrix  W  W^ 
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Example:   Suppose  there  is  a  two  equation  system  with  four  exogenous  and 
two  endogenous  variables.   The  following  program  will  calculate  LIML 
estimates  for  both  equations: 

KLC(C)(S1)(0)(2)(U)(  )(  )(S2). 

(2)(2)(1)(U)(5)(6). 

(2)(2)(2)(3)(5)(6). 

END  P 

K2C(S1)(S3)(SU)(S2)(U)(2)*-1*. 
(2)(2)(1)(U)(5)(6). 
(2)(2)(2)(3)(5)(6). 

END  P 

VI.   Two-stage  Least  Squares (2SLS) 

VJhen  K  =  1  is  specified  Two  stage  Least  Squares  estimates  are  computed. 

Example :   Suppose  one  has  a  three  equation  model  that  contains  three 
endogenous  and  ten  exogenous  variables.   The  following  program  will  calculate 
2SLS  estimates. 

K1C(C)(S2). 

END  P 

K2C(S2)(S3)(SM(  )(10)(3)*1.*. 

(5)(2)(1)(2)(3)(M(5)(H)(13). 

(M(2)(3)(U)(6)(7)(12)(13). 

(5)(2)(1))2)(8)(9)(10)(11)(12). 

END  P. 


rTI. 


Parameters 


A.   MAIN  Parameters 

1.   KIClass 

Parameter 
Number 

1 

2 


1+ 
5 


Use  or  Meaning 

Input  Address.   CARDS,  SEQUENTIAL  1-15- 

Output  Address  for  raw  data  covariance 
matrix.   SEQUENTIAL  1-15- 

Type  of  Input  0  =  raw  data 

1  =  cross-products 

2  =  covariance 

Number  of  equations  of  LIE;  0  otherwise 

Total  number  of  exogenous  variables  in  LIE 
system;   1  <_  n  <_  9^+  • 

1  if  want  covariance  matrix  printed. 
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Parameter 
Number 

7 
8 


Use  or  Meaning 

1  if  want  cross-products  matrix  printed. 

Output  Address  for  eigenvalues. 
SEQUENTIAL  1-15- 

1  if  want  correlations  matrix  printed. 


2.   K2Class 
Parameter 


Number 


5 
6 
7 


Use  or  Meaning 

Input  Address  same  as  output  address  for 
KIClass. 

Output  Address  for  estimated  coefficients. 
SEQUENTIAL  1-15- 

Scratch  Address 
SEQUENTIAL  1-15- 

Input  Address  for  Eigenvalues  same  as 
output  address  in  KIClass. 

Number  of  exogenous  variables  in  the  system. 

Number  of  equations. 

Floating  point  value  of  k.   (See  special 
comments.)   This  value  should  be  enclosed 
in  asterisks. 


B.   Sub-parameters  (Equation  Control  Cards) 

Subparameters  are  needed  for  all  K2Class  programs  and  for  KIClass 
when  LIML  estimates  are  desired.   There  is  a  subparameter  card  for 
each  equation.   If  subparameters  appear  in  both  KIClass  and  K2Class, 
they  must  appear  in  the  same  order. 

Each  equation  control  card  has  the  following  form: 

Parameter 


Number 
1 
2 
3 


Use  or  Meaning 

Number  of  exogenous  variables  in  the  equation. 

Number  of  endogenous  variables  in  the  equation. 

The  variable  number  of  all  variables  in  the 
equation  in  the  order : 


K-CLASS  ESTIMATION 
Page  6 

1  -  exogenous  in  the  equation 

2  -  endogenous  in  the  equation  with 

variable  standardized  on  last. 


VIII.   Special  Comments 


A.  If  K  =  *0* 
If  K  =  *1* 
If  K  =  *-l* 


Ordinary  Least  Squares  Estimates  are  computed. 
2-Stage  Least  Squares  Estimates  are  computed. 
Limited  Information  Maximum  Likelihood  Estimation  are 
computed. 


B.  When  LIML  estimates  are  desired,  the  output  and  input  addresses 
for  eigenvalues  must  he  specified  . 

C.  KIClass  accepts  data  from  cards  or  intermediate  storage  either  as 
raw  data,  cross-products,  or  covariance.   If  cross-products  or 
covariance  are  used  as  input  to  KIClass,  the  matrix  must  be  in 
the  following  order : 


Cross-products 


Ex 


Covariance 


Ex 


cross- 
products 


covariance 


Input  to  K2Class  must  come  from  KIClass.   However,  once  data  has  been 
processed  by  KIClass,  it  may  be  used  by  K2Class  any  number  of  times. 


IX. 


References 


Goldberger,  A.  S. ,   Econometric  Theory,   New  York,  John  Wiley  and  Sons, 
Inc.  ,  196U. 

Johnston,  J.  ,  Econometric  Methods ,  New  York,  McGraw-Hill  Book  Company, 
Inc. ,  I960. 


LINEAR  PROGRAMMING 


I.   General  Description 

LINEAR  PROGRAMMING  maximizes  or  minimizes  a  linear  function  subject  to 
certain  linear  inequalities  called  constraints. 

In  matrix  notation : 

Find  the  solution  to 

AX  <,  =  ,  >b   (a  system  of  linear  equations  or  inequalities) 
which  maximizes  (or  minimizes) 

Z  =  CX 
where  X  >  0 

A  is  the  matrix  of  coefficients  of  the  constraints,  X  the  vector  of  vari- 
ables, C  the  vector  of  costs  or  profits  associated  with  e ach  variable,  and  b 
a  vector  or  matrix  of  non-negative  constants  which  places  a  bound  on  the  linear 
equations. 

The  equations,  AX  <,  =,  >b  in  n  variables  define  and  bound  a  space  called 
the  feasible  space  in  which  all  allowable  values  of  the  n  variables  are  defined. 
The  SIMPLEX  criterion  finds  those  combinations  of  variables  which  optimize  the 
objective  function  within  this  feasible  space.   To  solve  the  system  of  linear 
equations  defined  above,  the  inequalities  must  be  changed  to  equalities.   This 
is  accomplished  by  addition  of  surplus  variables  to  "greater  than"  constraints, 
and  slack  variables  to  "less  than"  constraints.   To  create  the  basis  for  solving 
a  system  of  linear  equations,  an  identity  matrix  must  be  formed  and  augmented 
to  the  A  matrix  of  structural  variables.   Creation  of  the  identity  matrix  is 
completed  by  addition  of  artificial  variables  to  constraints  with  a  "greater 
than"  relational  operator.   The  program  adds  any  needed  variables. 

Since  there  are  more  variables  (structural  +  surplus  +  slack  +  artificial) 
than  rows,  some  method  must  select  which  variables  will  be  in  solution.   The 
SIMPLEX  Algorithm  selects  a  number  of  variables  (equal  to  the  number  of  rows) 
which  will  be  in  solution.   The  final  solution  is  the  maximum  (or  minimum)  of 
the  linear  function-  subject  to  the  constraints.   Since  slack  and  surplus  vari- 
ables have  "real"  meaning,  they  may  appear  in  the  final  and  intermediate  solu- 
tions.  Their  presence  as  a  non-zero  value  indicates  that  the  constraint  to 
which  they  were  added  is  not  binding.   Artificial  variables  have  no  "real" 
meaning.   Presence  of  artificial  variables  in  solution  indicates  that  some  con- 
straints are  so  constructed  as  to  preclude  a  solution  which  has  "real"  meaning. 

The  slack  and  surplus  variables  are  given  costs  of  zero  in  the  objective 
function.   Artificial  variables  are  given  large  negative  costs.   SIMPLEX  attempts 
to  drive  artificial  variables  from  solution. 

Failure  to  drive  artificial  variables  from  solution  may  indicate  a  problem 
in  which  constraints  are  mutually  exclusive  or  that  the  cost  assigned  to  the 
artificial  variable  is  not  large  enough. 
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In  matrix  notation  the  augmented  matrix  before  calculations  begin  would 
appear  as:   ,   .   i 

A  I  I  I  S  I  =  b 
where  I  is  the  identity  matrix  of  slack  and  artificial  variables  and  S  is  the 
matrix  of  surplus  variables.   Row  operations  are  performed  on  the  augmented 
matrix  according  to  the  SIMPLEX  criterion.   After  any  number  of  row  operations, 
the  inverse  matrix  of  the  original  coefficients  of  structural  variables  now 
in  solution  is  contained  in  the  columns  where  the  original  identity  matrix  was 
located.   At  every  stage  (row  operation)  an  identity  matrix  will  be  present. 
This  identity  matrix  indicates  the  variables  in  solution. 

Since  the  original  table  is  stored  by  the  program,  it  is  possible  to  com- 
pare the  results  of  the  inverse  obtained  through  LINEAR  PROGRAMMING  with  the 
inverse  obtained  by  a  standard  inversion  technique.   The  user  may  set  the  absolute 
value  for  this  comparison  in  Parameter  3-   If  the  comparison  does  not  meet  the 
accuracy  requirement,  a  new  table  is  formed  using  the  original  table  and  the 
calculated  inverse.   After  a  feasibility  check,  the  program  continues  calculations 
until  an  acceptable  solution  is  obtained. 

References 

Llewellyn,  R.   LINEAR  PROGRAMMING.   New  York,  New  York:   Holt,  Rinehart, 
and  Winston,  1966 . 

Hadley,  G.   LINEAR  PROGRAMMING.   Reading  Massachusetts:   Addison-Wesley, 
1953. 

LI .   Restrictions 

The  program  is  limited  to  a  maximum  of  90  rows  or  constraints,  300  columns 
or  variables,  and  5  columns  in  the  requirement  matrix. 

These  limits  are  internal  limits  and  the  user  is  warned  that  large  problems 
may  exceed  the  program  capacity  during  accuracy  check  and  calculations  involving 
multiple  column  requirement  matrices.   Program  capacity  WILL  be  exceeded  if: 
the  number  of  constraints  +  number  of  structural  variables  +  number  of  "greater 
than"  inequalities  >  300. 

Input  may  come  ONLY  from  CARDS  in  the  form  of  subparameters. 

LI.   Parameters 

All  floating  point  numbers  (indicated  by  FP)  must  be  enclosed  by  a  pair  of 
asterisks.   All  integer  numbers  (indicated  by  IN)  must  be  enclosed  in  parentheses. 
The  main  call  to  the  program  and  each  subparameter  must  be  terminated  by  a  period 

(•)• 

The  program  is  entered  by  punching  the  symbols  L-P  followed  by  the  appropri- 
ate main  parameters  and  subparameters.   All  main  parameters  have  default  options. 
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Main  Parameters  to  follow  L-P 

1  Cost  of  artificial  variables  *large  negative  FP  numbers*. 
Default  =  -1.E50. 

2  Minimum  value  for  calculations  *FP*.   If  any  calculation 
falls  below  this  value,  it  is  set  to  zero.   Default  = 
internal  calculations. 

3  Value  for  accuracy  check  *FP*.   If  absolute  value  for  calcu- 
lated difference  (See  General  Description)  falls  below  this 
value,  final  value  is  termed  inaccurate  and  calculations  are 
performed  to  correct  rounding  errors.   Default  =  .5  . 

k  If  1,  suppress  print  of  solution  matrix  (IN). 

5  If  1,  suppress  print  of  check  matrix  (IN). 

6  Print  every  IN   step,  i.e.  row  operation  (IN). 

7  If  1,  insert  small  positive,  non-zero  number  for  any  zero  in  the  b 

vector.   Useful  aid  if  b  vector  contains  many  zeros. 

Subparameter 

me  program  now  expects  to  find  the  word  MINimize  or  MAXimize  followed 

by  a  string  of  constants  which  represent,  in  sequential  order,  the  cost  or 
values  associated  with  each  variable.   All  non-zero  constants  (with  or  without 
decimal)  must  be  enclosed  by  a  pair  of  asterisks.   Zeros  may  be  enclosed  by 
asterisks.   A  series  of  sequential  zeros  may  be  represented  by  a  pair  of  paren- 
theses, i.e.  the  integer  number  in  the  pair  of  parentheses  represents  the  number 
of  sequential  zeros  to  be  inserted.   All  coefficients  must  appear  and  be  in 
sequential  order. 

The  cost  coefficients  representing  the  objective  function  are  terminated  by 
a  period.   The  constraints  are  entered  in  a  similar  manner.   All  variables  must 
be  in  sequence.   Coefficients  of  zero  must  be  included.  Multiple  requirement 
vectors  are  entered  in  the  standard  form.   The  constraint  is  terminated  by  a  per- 
iod.  Comments  which  do  not  include  period  (.),  comma  (,),  asterisks  (*)    or  left 
parenthesis  may  be  entered  at  any  point  outside  those  characters  delimiting  con- 
stants.  The  requirement  vectors  are  separated  from  the  rest  of  the  constraint 
by  relational  operators.   All  coefficients  must  appear  and  be  in  sequential-  order. 

The  program  recognizes  three  relational  operators:   LE  (less  than  or  equal), 
EQ  (equal),  and  GE  (greater  than  or  equal).   These  relational  operators  are 
surrounded  by  quotes  (").   See  Section  V.   Examples  in  this  program. 

Output 

The  output  consists  of  the  objective  function,  the  final  solution  matrix, 
the  variables  in  solution,  and  the  optimal  functional  value.   In  addition, 
Shadow  Prices  or  opportunity  costs  are  printed.   Shadow  Prices  provide  useful 
information  on  the  "cost"  of  having  certain  constraints,  or  the  increased  pro- 
fit to  be  obtained  by  'relaxing'  a  particular  constraint. 
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For  example : 

Constraint  1:   IX(l)  +  2X(3)  <  5-0 
To  this  constraint,  slack  variable  X(l)  is  added  to  make  it  an  equality.   In 
the  final  solution,  X(l)  is  not  in  solution.   The  optimal  maximum  functional 
value  is  20.   The  'Shadow  Price'  on  variable  X(l)  is  2.0.   This  means  that  if 
we  relax  this  constraint  to  6.0,  the  optimal  maximum  value  could  be  22.0.   For 
every  unit  the  constraint  is  relaxed,  the  functional  value  will  be  changed  by 
the  Shadow  Price.   The  Shadow  Price  holds  until  the  constraint  is  no  longer 
binding.   The  same  logic  may  be  applied  to  "GE"  type  constraints  with  surplus 
variables.   For  interpretation  of  Shadow  Price  for  structural  and  artificial 
variables,  the  user  is  referred  to  texts  under  headings  such  as  "Dual  Algorithm", 
Interpretation  of  the  Dual",  and  "Opportunity  Costs". 

Basis  variables  refer  to  those  variables  which  form  the  original  identity 
matrix.   The  variable  numbers  are  listed  in  the  order  they  were  added  to  the 
constraints.   The  number  of  basis  variables  will  always  equal  the  number  of 
constraints.   To  determine  whether  a  basis  variable  is  a  slack  or  artificial 
variable,  refer  to  the  coefficients  of  these  variables  in  the  objective  func- 
tion.  A  slack  variable  will  have  a  coefficient  of  0.0. 

MESSAGES 


SLEM  TOO  LARGE:   More  than  300  variables  or  100  constraints  on  input  or 
during  addition  of  slack,  surplus,  and  artificial  variables. 

NORM  FOR  CUTOFF:   Value  of  Main  Parameter  Number  2,  either  supplied  or 
default. 

.  -   R  IN  SIMPLX:   Source  Program  Error.   See  a  consultant. 

"OLUTiPN  UNBOUNDED:   Constraints  do  not  form  a  closed  space.   Optimal 
functional  value  is  infinite. 
iER  OF  ITERATIONS:   Number  of  row  operations  needed  to  calculate  final 
solution.   For  multiple  requirement  vectors,  number  is  not  cumulative. 

ACCURACY  ACCEPTABLE  or  ACCURACY  NOT  ACCEPTABLE:   Comparison  with  Main  Para- 
meter Number  3- 

VARIABLE  ADDED 

VARIABLES  ARE:   Iterations  either  inaccurate  and  new  variable  added 
or.  during  execution  of  multiple  requirement  vector,  a  new  variable  had 
to  be  added  to  make  problem  feasible  (requirement  vector  positive). 

NON-RESOLVABLE  TIE :   Cannot  occur  mathematically.   Only  reason  for  occurance 
is  due  to  rounding  error  in  machine.   Can  be  corrected  by  incrementing 
or  decrementing  requirement  vector  by  a  small  amount.   Perform  this 
only  for  constants  of  same  value.   (Use  Parameter  7). 

Other  messages  should  be  self-explanatory. 

. 'r    ■■i-il   Comments 

Speed  and  accuracy  can  be  increased  by  observing  the  following  suggestions: 

1)  Never  make  Parameter  3  (accuracy  check)  larger  than  (0.1)  X  (number  of 
significant  digits  in  table).   For  example,  if  numbers  in  the  table  are 
k,    5,  .001,  86,  95-32,  you  have  "one"  significant  digit.   Set  Parameter  3 
to  *.l*. 
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2)  Scale  numbers  in  table  to  get  them  into  same  range.   For  example,  if 
table  entries  are  of  the  order  10  ,  and  the  requirement  vectors  are  of 
the  order  103,  scale  requirement  vectors  to  KH-  and  rescale  solution  by 
10^.   The  objective  function  may  also  be  rescaled  in  a  similar  manner. 
Rescaling  essentially  reflects  the  number  of  significant  digits. 


V .   Example  s 


The  problem: 

Minimize  --75X(l)  +  150X(2)  -.02X(3)  +  6x(k) 
Subject  to  the  following  constraints: 
Constraint (l) 

.25X(l)  -60X(2)  -.04X(3)  +  9X(*0  <  0,  1,  2 
Constraint (2) 

.05X(l)  -90X(2)  -.02X(3)  -3X(U)  <  0,  1,  2 
Constraint (3) 

1X(3)  <  1,  2,  3 

Could  be  set  up  on  cards  as  follows : 


/*ID 

//  EXEC   SOUPAC 
//SOUPAC.SYSIN  DD  * 

L-P*-1.E20****.1*()0  (!)• 

MIN*- . 75**150**- . 02**6* . 

LABOR  * . 25** -60**- . 0k** 9*" LE" *0**1**2* . 

LAND* .  05-**-90**- .  02**-3  •  0*"LE"*0**1**2* . 

CASH(2)*l*(lV'LE"*r*-*2**3'*- 

END  PROGRAM 

END  SOUPAC 

/* 
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This  problem  will  result  in  an  unbounded  solution  with  requirement 
vector  number  one.   The  problem  terminates  without  performing  calculations 
on  the  other  vectors. 


Note  insertion  of  sequential  zeros  on  CASH  card. 


QUADRATIC  PROGRAMMING 

I.  General  Description 

This  program  maximizes  the  quadratic  function  ex  +  1/2  xTDx  subject  to 
the  linear  constraints  Ax  <_  b,  where  c  is  an  n-vector,  D  is  a  symmetric 
negative  definite  n  by  n  matrix,  A  is  an  m  by  n  matrix  of  coefficients 
or  constraints  and  b  is  an  m  vector. 

The  Kuhn-Tucker  theory  shows  that  a  solution  to  the  constrained  maxi- 
mization problem  is  obtained  if  and  only  if  vectors  x,  L,  v,  and  w  can  be 
found  such  that : 

Dx  -  ATL  +  v  =  -c 
Ax       +  w  =  b 

where  the  elements  of  x,  L,  v,  and  w  are  non-negative  and  the  conditions 
xv  =  0  and  lw  =  0  are  satisfied.   To  find  these  vectors,  artificial  vectors 
z1  and  z^  are  added  to  the  first  equation  and  a  y-vector  is  added  to  the 
second.   Simples  techniques  are  then  used  to  eliminate  first  the  y 
and  then  the  z-^  amd  z2  variables. 

References : 

Carr,  C.  R.  and  C.  H.  Howe,  Quantitative  Decision  Procedures  in  Management 
and  Economics,  McGraw-Hill ,  196*+ . 

Hadley,  G. ,  Nonlinear  and  Dynamic  Programming,  Addi son-Wesley,  ISSh. 

Wolfe,  P.,  "The  Simplex  Method  for  Quadratic  Programming",  Econometrica, 
27,  1959,  PP.  382-398. 

NOTE:   Carr  and  Howe  claim  that  elements  of  the  w-vector  may  not  be  en 
entered  in  the  first  stage  of  the  simplex  procedure.   Since  this  requires 
that  there  exist  a  solution  to  Ax  =  b,  it  is  a  severe  restriction.   It 
is  also  unnecessary,  and  this  program  does  enter  w-variables  during  the 
first  stage.   Otherwise,  the  procedures  used  closely  follow  those  of 
Carr  and  Howe. 

II.  Restrictions 

The  maximum  number  of  x-variables  is  kO.      The  number  of  x-variables 
plus  the  number  of  constraints  must  be  <_  80. 

The  D-matrix  must  be  negative  definite.   If  this  is  dubious,  use  the 
PRINCIPLE  AXIS  FACTOR  ANALYSIS  program  to  extract  the  eigenvalues.   All 
must  be  negative.   Semi-definite  D-matrices  may  be  perturbed  or  the  user 
may  limit  the  number  of  iterations  to  be  performed.   If  this  limit  is 
exhausted,  final  solution  vectors  will  be  printed  out  (see  below). 

The  only  form  of  input  is  a  matrix  of  data.   If  there  are  n  x-variables 
and  m  constraints ,  the  matrix  should  have  n  +  1  columns  and  m  +  n  rows , 
partitioned  as  follows: 
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D  (n  x  n) 


A  (m  x  n) 


c  (n  x  1) 


b  (m  x  1) 


Note  that  this  is  the  c  vector,  not  the  -c  vector  mentioned  in  the 
Kuhn-Tucker  formulas.   Also  note  the  +  sign  and  the  1/2  coefficient  of  the 
xDx  term.   All  constraints  in  this  type  of  input  are  assumed  to  be  <  type. 
Multiply  >  constraints  through  by  -1.   The  equality  constraing: 


n 


Z   a.jX  =  b. 

0=1 

is  equivalent  to  the  two  constraints  Za-^X.  <;  b-  and  Z-a.  .X.  <;  b.  . 

This  matrix  can  be  read-  in  from  cards  or  from  temporary  storage.   Only 
one  problem  can  be  read  from  a  tape  or  sequential  location.   Multiple  problems 
must  be  read  from  cards,  the  matrix  for  each  problem  preceded  by  its  own 
"DATA  (N  +  l)  (FORMAT)"  card  and  followed  by  its  own  "END#"  card. 

The  elements  of  the  w-vector  are  always  non-negative  and  are  to  be 
considered  "slack"  for  <  constraints  and  "surplus"  for  >  constraints. 

The  user  may  obtain  the  basis  vector  at  the  end  of  each  iteration 
showing  which  variables  are  in  the  basis  and  their  quantities  (option  2). 
lie  may  alternatively  have  the  entire  matrix  printed  out  after  each  iteration 
(option  3)-   The  user  is  cautioned  that  option  3  can  use  immense  quantities 
of  paper  and  time  unless  the  problem  is  very  small. 

A  method  outlined  in  Hadley,  pages  183  -  186,  is  used  to  avoid  cycling 
in  cases  of  degeneracy. 

Parameters 

The  program  call  card  should  have  the  name  QUADRATIC  PROGRAMMING 
followed  by  these  parameters: 

Parameter 
Number  Use  or  Meaning 

1  Number  of  problems  following 

2  Input  Address.   SEQUENTIAL  1-15  or  CARDS. 

3  Output  option: 

0  if  final  results  only 

1  if  iterated  basis  vectors 

2  for  entire  iterated  matrix 

h  Limit  on  number  of  iterations  if 

desired.   Leave  blank  otherwise.   Default 
is  1000. 
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Parameter 
Number 


Use  or  Meaning 


IV.   Example  s 
Example  I 


Pertubation  quantity.   Punch  quantity 
to  be  subtracted  from  diagonal  of  D- 
matrix  between  asterisks  instead  of 
parenthesis;  e.g.,  *.001*.   Leave  blank 
if  not  desired. 


Suppose  we  wish  to  maximize  the  quadratic  function 
F  =  IOx-l  +  20x2  +  15x3  -  Ix-l2  -  2x22  +  lxix2 

subject  to  the  constraints 


2xi  +  3x2  +  Ix^  <  50 
lxj  +  3x3  <  70 
3xj_  +  2x2       <  60 


Since  the  D-matrix  is  only  negative  semi-definite,  it  should  be  perturbed 
to  insure  convergence  to  a  solution.   The  following  set  of  cards  would  solve 
the  problem  using  data  matrix  input : 

/*ID 

//  EXEC  SOUPAC 

//SYSIN  DD  * 

QUADRATIC  PROGRAMMING  (l) (CARDS) (0) (0)*.001*. 

END  SOUPAC 

DATA(i+)(UF3.0) 


0 

1 

0 

10 

1 

-if 

0 

20 

c 

0 

0 

0 

Vi 

2 

3 

1 

50 

1 

0 

3 

70 

b 

3 

2 
// 

0 

60 

/* 
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Example  II : 

Maximize 

F  =  8x-|_  +  10x2  -  X]_2  -  xp2 
subject  to  the  constraint 

3xx  +  2x2  <  6 
The  D-matrix  is  negative  definite.   The  problem  would  be  set  us  as  follows: 

/*ID 

//  EXEC   SOUPAC 
//SYSIN  DD  * 
QUAD  (1) (C)(2). 
END  SOUPAC 
DATA(?  (F2.0,2F3-0) 
-2  0 
D   0  -2    10 c 


A  END# 
/* 


b 
b 


The  extreme  value  of  the  objective  function  for  this  example  is  .213E02 


THREE  STAGE  LEAST  SQUARES  ESTIMATION 

I.  GENERAL  DESCRIPTION 

The  Three  Stage  Lease  Squares  Estimation  program  calculates  three  stage 
least  squares  estimates  and  an  asymptotic  covariance  matrix.   A  raw  data 
covariance  matrix  and  two  stage  least  squares  residual  covariance  matrix 
are  the  necessary  input.   Calculations  are  carried  out  an  in  "Econometric 
Theory"  by  Arthur  S.  Goldberger,  pp.  3^7-352.   The  coefficients  may  also  be 
stored  for  use  with  the  Econometric  Reduced  Form  and  Residual  Analysis 
program. 

References : 

Goldberger,  Authur,  S.  ,  Econometric  Theory,  New  York,  John  Wiley  and  Sons, 
Inc.  196h. 

Johnston,  J.,  Econometric  Methods ,  New  York,  McGraw-Hill  Book  Company,  Inc., 
I960. 

II.  RESTRICTIONS 

The  raw  data  covariance  matrix  must  be  arranged  in  the  K-Class  Estimation 
program  write-up.   The  program  has  the  following  size  restrictions:   Total 
number  of  coefficients  estimated  <_  1^+0. 

(NEQ  x  NVAR)  +  (NEQ  x  NEQ)  +  (NVAR  x  NVAR)   <_  20,000 

NVAR  =  the  total  number  of  variables 
NEQ  =  the  number  of  equations  estimated 

Note:   Endogenous  coefficients  are  printed  out  first,  followed  by  exogenous 
coefficients. 

III.   PARAMETERS 

The  parameters  appear  on  the  program  card  following  the  name  Three  Stage 
in  the  following  order : 

Parameter 
Number  Use  or  Meaning 

1  Input  Address  for  raw  data  covariance  matrix. 
SEQUENTIAL  1-15- 

2  Output  Address  for  coefficients. 
SEQUENTIAL  1-15- 

3  Input  Address  for  residual  covariance  matrix. 
SEQUENTIAL  1-15- 

h  Number  of  equations  to  be  estimated 

5  Number  of  exogenous  variables 
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Subparameters 


For  each  equation  a  card  specifying  the  variables  in  the  equation  must 
follow  the  main  parameter  card  with  the  following  parameters : 

Parameter 
Number  Use  or  Meaning 

1  Number  of  exogenous  variables  in  the  equation 

2  Number  of  endogenous  variables  in  the  equation 

3  to  N  +  2  Variable  number  of  the  N  variables  included  in 

the  equations  with  exogenous  variables  first; 
endogenous  variables  next ,  with  the  variable  on 
which  the  system  is  normalized  last. 

IV.  SPECIAL  COMMENTS 

The  Three  State  Least  Squares  Estimation  program  requires  input  from 
several  other  SOUPAC  programs.  The  following  is  an  example  of  the  steps 
needed  to  calculate  the  necessary  input. 

V .  EXAMPLE 

K1CLASS (Tl ) ( T2 ) ( 0 ) ( 0 ) ( ) ( 1 ) (1 ) . 
K2CLASS(T2)(T3)(TM()(8)(2)*1.*. 
(M(2)(1)(2)(3)(M(D(2). 
(M(2)(5)(6)(7)(8)(2)(l). 

END  P 

ECON  (T3)(T2)()(8)()(T5). 

THREE(T2)(T6)(T5)(2)(8). 

(M(2)(1)(2)(3)(M(D(2). 

(M(2)(5)(6)(T)(8)(2)(1). 

END  P 

Notice  that  the  equation  control  cards  for  both  K2CLASS  and  Three-stage 
Least  Squares  must  be  in  the  same  order. 

Also  notice  that  an  ENDP  card  is  required  after  the  equation  cards. 
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AUTOCORRELATION  AND  SPECTRAL  ANALYSIS 


General  Description 

The  calculation  of  autocorrelation  coefficients  and  the  determination 
of  power  spectra  are  of  interest  to  economists  in  the  study  of  time  series 
and  others  whose  interest  leads  them  to  suspect  some  repetition  of  variation 
within  a  set  of  observations.   For  a  single  variable,  the  autocorrelation, 
rp,  is  calculated  as  follows: 

(N-p)  IX. X.    -  £X.£X. 
^  "  [(N-p)  tt*   -  (2X.)2]1/2  [(N-p)  ZXi+p2  -  ^i+p)2]V2 

N  is  the  total  number  of  observations;  p  is  an  arbitrarily  chosen  time  lag. 
The  power  spectrum  is  a  Fourier  transformation  of  the  autocovariances  and 
is  used  in  the  harmonic  analysis  of  Xj_  as  a  function  of  time.   Raw  estimates 
of  the  spectral  density  are  given  by  formula  (2)  and  a  smoothed  value  is 
given  by  formula  (3) : 

(2)   Lp  =  W0  +  2  ZWq  cos  SM  +  Wm  cos  pfl 


(3)   U   =  0.23  L    +  0.5^  L   +  0.23  L 


p+1 


This  program,  however,  not  only  will  calculate  values  of  X^  with  X.+  , 
but  will  calculate  values  for  all  possible  pairs  of  variates,  Xj_j  with 

Xi+P,j- 

For  a  more  detailed  discussion,  see: 

R.  W.  Southworth,  "Autocorrelation  and  Spectral  Analysis" 
from  Mathematical  Methods  for  Digital  Computers,  by  Anthony 
Ralston  and  Herbert  Wilf;  John  Wiley  and  Sons,  i960,  pp.  213-20. 

II.   Restrictions 

This  program  is  designed  to  accept  from  1  to  15  variables.   The  number 
of  observations  is  limited  to  2500  or  less.   Since  all  possible  pairs  of 
lagged  cross-correlations  are  printed  for  values  of  p  from  p  to  p  + kd  (p, 
k,  and  d  are  set  by  program  parameters),  the  output  under  maximum  circumstances 
will  be  very  large.   The  value  of  p  is  limited  to  1500  or  less.   Current 
literature  suggests  that  for  a  correlation  between  Xj_  and  Xj_+p,  the  value  of 
p  should  not  exceed  10  per  cent  of  the  total  number  of  observations.   It 
should  be  pointed  out,  however,  that  a  set  of  observations,  Xj_,  can  be 
broken  up  into  blocks  of  equal  size,  and  the  several  blocks  can  be  treated  as 
additional  variables.   In  this  way,  the  value  of  p  can  exceed  1500  and  also 
the  value  of  N  can  exceed  2500. 

Ill .   Parameters 

Data  may  be  read  from  the  program  either  from  cards  or  from  any  temp- 
orary storage  medium.   In  the  output,  X  is  the  lead  variable  and  Y  is  the 
lag  variable. 
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The  call  card  will  have  the  program  name  first.   After  this  the  para- 
meters are  in  the  following  order: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15- 

2  An  integer  number  denoting  the  minimum  number 
of  lag  periods,  p. 

3  An  integer  number  denoting  the  maximum  lag, 
p  +  kd,  at  which  time  the  execution  of  the 
program  is  terminated. 

h  An  integer  number  denoting  the  increment,  d, 

-to  be  added  to  p,  so  that  the  lag  period  can 
be  altered. 

Parameters  3  and  k   will  be  useful  generally  when  one  wants  to  study  the 
changes  in  the  power  spectra  as  a  function  of  lag  length. 

5  The  presence  of  a  number  greater  than  0 
indicates  that  the  means,  standard  deviations, 
and  correlations  are  to  be  printed. 

6  Standard  SOUPAC  output  parameters  consisting  of 

an  output  address,  and/or  /  PRINT,  and/or /X. 

If  the  punch  or  print  option  is  chosen,  then 
the  quantities  output  are  the  lag  period,  auto- 
correlation coefficient,  autocovariance  function, 
raw  spectrum,  and  smoothed  spectrum  in  15,  !+El8.8 
format . 

7  A  number  greater  than  0  indicates  that  the  lead- 
lag  sums  and  cross-products  are  to  be  printed. 

If  a  parameter  is  left  blank,  this  is  the  same  as  specifying  a  zero.   It 
should  be  pointed  out  that  the  output  for  several  variables  is  quite  large, 
and  unless  there  is  interest  in  the  output,  several  of  the  parameters  should 
be  left  as  blanks. 
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CLIQUE  ANALYSIS 


I-   General  Description 

This  routine  is  designed  to  enumerate  all  third  order  or  higher 
interrelationships  (communication  chain)  which  exist  in  a  sociometric 
matrix.   The  algorithm  is  identical  to  the  method  described  by  Harary 
and  Ross.   A  communication  chain  is  considered  to  be  any  submatrix  of 
order  three  or  more  in  which  all  the  off  diagonal  cells  are  full. 

II-   Restrictions 

The  maximum  dimensions  for  an  input  array  is  190  x  190.   Input  may 
come  from  cards  or  any  temporary  storage  area.   The  array  must  contain 
only  zeroes  and  ones  in  its  elements.   Any  number  greater  than  zero  is 
considered  to  be  one;  therefore,  care  should  be  used  in  constructing  the 
array.   Symmetry  in  the  input  matrix  is  not  necessary  since  the  program 
automatically  forces  symmetry  through  element-wise  products.   It  is 
suggested  that  TRANSFORMATIONS  be  used  to  modify  input  arrays  when  various 
cut-off  points  are  used  to  distinguish  ones  from  zeroes. 

III.   Parameters 

The  name  CLIQUE  ANALYSIS  appears  first  on  the  program  call  card  and 
is  followed  by  the  following  parameter: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address  of  data  array. 

CARDS  or  SEQUENTIAL  1-15. 

IV.   Special  Comments 

The  following  is  an  illustration  of  the  clique  detection  concept. 
Data  matrix: 


0 

1 

1 

1  ° 

0 

0 

0 

0 

1 

0 

lie 

0 

0 

0 

c 

0 

1 

1 

0 

1 

I 

1 

0 

0 

0 

0 

0 

1 

0 

1 

0 

1 

0 

0 

0 

0 

1 

0 

1 

0 

1 

1 

1 

0 

0 

0 

1 

1 

1 

0 

1 

] 

0 

0 

0 

0 

0 

1 

] 

0 

1 

0 

, 

0 

0 

0 

] 

1 

1 

0 

Oj 

oj 

0 

0! 

0 

ll 

1 

ll 

0 
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Clique  (1)  1,  2,  3 

Clique  (2)  8,  6,  7,    9 

Clique  (3)  k,  3,  5 

Clique  (k)  3,  5,  6 

Clique  (5)  k,  5,  7 

Clique  (6)  5,  6,  7 

Harary  and  Ross,  "A  Procedure  for  Clique  Detection  Using  the  Group  Matrix", 
Sociometry,  Vol.  20,  No.  3,  1956,  pp.  2-5,  215. 
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PAIRED  COMPARISONS 

General  Description 

Paired  comparisons  is  a  method  of  obtaining  empirical  estimates  of 
the  form  "stimulus  J  is  judged  greater  than  any  other  stimulus  i."   Each 
stimulus  in  turn  serves  as  the  standard;  that  is,  all  possible  pairs  of 
stimuli  are  compared.   With  n_  stimuli,  there  are  n  (n-l)/2  pairs. 
Comparisons  of  a  stimulus  with  itself  is  disregarded;  it  is  assumed 
that  a  proportion  of  0.05  would  result.   In  the  following  m  =  no.  of 
subjects  =  sample  size. 

Each  subject's  preferences  are  tabulated,  and  the  total  number  of 
times  he  preferred  each  stimulus  is  computed  producing,  A  ,  an  n  x  n 
matrix  of  l's  and  0's  ,  k  =  1,  m.   Totals  for  each  stimulus  and  a  grand 
total  are  computed  for  each  subject  and  this  m  x  n  +  1  matrix  is 
referred  to  as  individual  preference  sums.   The  individual  tables,  A  , 
are  summed  over  all  subjects  to  form  an  n  x  n  frequency  matrix  F, 
whose  elements  (fij )  denote  the  observed  number  of  times  stimulus  j  was 
judged  greater  than  stimulus  i. 

The  matrix  of  proportions,  P,  is  then  computed  from  F,  so  that  p.  . 
is  the  observed  proportion  of  times  stimulus  j  was  judged  greater  thanJ 
stimulus  i.   The  matrix  X  is  derived  from  P  by  reference  to  the  normal 
curve;  x. .  is  the  unit  normal  deviate  corresponding  to  the  element  p... 
These  are  the  sample  estimates  of  the  values  required  to  determine  the 
scale  values  of  the  stimuli.   The  scale  values  are  computed  by  summation 

producing  s.,  a  least  squares  estimate  of  the  scale  value  of  stimulus  j. 

J 


II.   Input 


A.  Both  an  indication  of  ordering  for  each  pair  and  an  array  of 
subjects'  choices  are  required.   The  former  must  be  given  as  a  set 
of  pair  subparameter  cards  and  the  latter  as  an  observation  of 
data  for  each  subject  in  a  data  deck. 

B.  In  the  subjects  deck  one  number  is  used  to  denote  the  subject's 
choice  for  each  pair.   This  choice  may  be  "is  greater  than," 
"is  better  than,"  "is  brighter  than,"  etc.   This  number  is  1  if 
the  subject  chose  the  left,  or  first  stimulus,  2  if  the  subject 
chose  the  right  or  second  stimulus.   No  other  coding  is  acceptable. 

C.  The  pair  cards  consist  of  one  mention  each  of  every  possible  pair 
of  stimuli.   The  order  of  the  pairs  is  the  same  as  the  order  of 
the  subjects'  choices,  i.e.  pair  1  corresponds  to  item  1  of  the 
subject  array.   The  order  of  the  elements  in  the  pairs  is  reflected 
in  the  subjects'  choice  deck,  if  (5>T)  corresponds  to  a  1  then 

the  subject  chose  stimulus  5  over  stimulus  7»  if  (7,5)  were  the 
pair,  a  2  corresponding  would  mean  stimulus  5  preferred.   Note  that 
one  set  of  pair  specifications  serves  for  all  subjects. 
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III.   Formulas  and  Calculations 


A.   INDIVIDUAL  PREFERENCE  SUMS 


Let  A  "be  an  individual  preference  frequency  table,  a.  .  ,  is  an 
element  of  A,  i=l ,  n,  j=l,  n,  where  n  is  the  number  or  stimuli 


a. .  =  1  if  an  individual  chose  stimulus  j  over 

stimulus  i. 

a. .  =  0  if  the  individual  chose  stimulus  i  over  j 

a. .  =  a. .  =  0        no  stimulus  is  compared  with  itself, 
n    JJ 

Individual  preference  sum  for  stimulus  j  =  I   a. . 

i   1J 

Error  messages  concerning  incorrect  frequency  tables  refer  to  the 
configurations  of  Table  A.  A  can  be  correct  only  if  subject  data 
and  pair  cards  are  correct . 

STIMULUS  PREFERENCE  FREQUENCY  TABLE,  F 

Given  the  matrix  A  for  each  of  m  subjects 

m 
Stimulus  preference  frequencies  =  f . .  =    Z   a 


U    k=1   ij 


C.   TABLE  OF  PROPORTIONS,  P 


If  m  is  the  sample  size,  i.e.  number  of  subjects,  then, 

f .  . 

P..  ="iL 

ij    m 

D.   TABLE  OF  NORMAL  DEVIATES,  Z 

Let  p. .  be  an  element  of  the  table  of  proportions: 
Then  let 


e  - 


e..  =    log  (1/Pi.2) 

2.515517  +   .802853xe  +  .010328xe2 

1.  +  l.i+32T88xe   +  .l89269xe2  +  .001308xe3 


producing  a  z  for  each  e. ..   Critical  values  of  p  occur  at  0 ,  1 
and  .5  so  adjustments  are  made  for  these  values  before  the  formula 
is  applied  and  sometimes  after. 
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E.   SCALE  VALUES,  S 

s.  =   XJ 

J   


,  where  n  is  the  number  of  stimuli 

ii 

Then  total  scale  =  L   Si 

J   J 

A  row  of  scales  and  a  total  of  length  n  +  1  is  calculated. 

IV.   Output 

Matrices  for  individual  preference  totals  =„*  q 
always  printed.   Other  intermediate  re su^F  ^L  V    V    S '  *** 
on  option.   All  matrices  mav  h«  !+   ^sults  F,  P  and  Z  may  be  printed 

in  F  format.   F,  P  and  Z  are  n  x  n  ™?  "  °Ptl°n-   Ml  rSSUltS  are  Printe* 

aremxn  +  1   and  n  +  1  \L  ,•   T   f eS>   i ndivi dual  totals  and  S 

The  A  matrix  I.^^cS^rSj'S^S  =  *»  calculations). 

v-   Restrictions 

A'      ^^o™  SSS  ^E^^st1    \'   '^      The-e   "  n°   -friction  on  toe 

sublets  shoSrn:;e  ss^rs:.""' be  paired  "ith  ^  °th-  «• 

pairs      (te  sti^li)  ^  be  inSerted  t0  *  "»*™  °f  881 

n(n-l) 
*  =  — . 

Any  other  relationship  is  invalid. 

D'   the'fir J5'7-  V  ^^  thSn  (7'5)  is  invali*'   Also,  if  this  is 
"LuluT^^^^ 

VI-   Parameters 

the     p^eters ^  S&£££? *»"""•  °"  »»  **  ™*  — 

Parameter 
^S^r  Use  or  Meaning 


1 
2 


Input  Address  of  data.   CARDS,  SEQUENTIAL  1-5. 

Output  Address  of  individual  preference  sums. 
Always  printed.   SEQUENTIAL  1-5. 

n   sS^TTAf  1SS  °f,f lmUlUS  Preference  f-auency  table 
SEQUENTIAL  1-5  and/or  PRINT;  if  not  desired,  leave 
parameter  Wnr,v  '    a,e 


parameter  blank. 
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Parameter 
Number  Use  of  Meaning 

k  ft  Output  Address  of  proportions.   SEQUENTIAL  1-5  and/ 

or  PRINT;  if  not  desired,  leave  parameter  blank. 

5  ft  Output  Address  of  normal  deviates.   SEQUENTIAL  1-5 

and/or  PRINT;  if  not  desired,  leave  parameter  blank. 

6  Scale  Values.   SEQUENTIAL  1-5,  always  printed. 


ft   It  is  possible  to  punch  the  output  from  these  parameters  while 

executing  this  program.   If  you  need  this  option,  see  the  section  in 
the  Introduction  on  Input  and  Output.   Any  storable  output  may  be 
punched  using  the  Matrix  program. 


VII.   Examples 
A. 


/*ID  <accounting  information> 

//  EXEC   SOUP 

//SYS IN  DD   * 

PAIRED  COMPARISONS  (C)(  )(P)(P)(P). 

PAIRS  (l,2)(3,l)(i+,l)(3,2)(2,U)(it,3) 

END  P 

END  S 

DATA(6)(6F1.0) 

122211 

222111 

121122 


END  # 
/* 

Print  has  been  indicated  for  all  output  except  individual  preference 
sums  and  scale  values  which  are  always  printed. 

The  pairs  card  indicates  that  there  are  k   stimuli.   All  possible 
pairs  of  these  stimuli  are  presented  to  the  subjects,  and  the  subject's 
responses  are  recorded  in  the  order  (l,2),  (l,3),  (l,M,  (2,3),  (2,4), 
(3,^).   Some  of  these  pair  members  have  been  inverted  indicating  that 
no  special  order  is  required,  left  member  or  right  member  preference  of 
subjects  would,  of  course,  be  affected  by  the  inversion. 

The  pairs  need  not  be  given  in  the  increasing  order  of  the  example, 
but  at  all  times  the  order  of  the  pairs  is  the  order  of  the  corresponding 
subject  responses. 

The  data  deck  is  a  set  of  subject  responses  for  each  pair  of  stimuli. 
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B. 

/*ID<accounting  information 

//  EXEC  SOUP 

//SYSIN  DD   * 

TRA  (C). 

C0N(900)*1*. 

ADD  (l,625)(900)(l,625). 

OUT(Sl)(l,625). 

END  P 

PAI(S1)(  )(S2/P)(  )(P). 

PAI(5,3)(2,8)(1+,16)(7,26)(8,3)( )(U,13). 

PAl(25,2l+)(23,28)( )(U,12). 

END  P 


END  S 
DATA(325)(T5F1.0) 

Data  Deck — 5  cards  per  subject 
END# 
/* 


This  example  shows  a  program  for  26  stimuli,  26  x  25/2  =  325  is 
the  number  of  pairs  required  and  the  number  of  subject  preferences. 
Since  no  more  than  300  pairs  may  be  given  per  pairs  statement  at  least 
two  pairs  statements  are  needed,  two  are  shown.   The  unique  pairs  may- 
occur  in  any  order,  the  subject  responses  are  in  the  same  order. 

The  Transformations  program  shown  is  designed  to  correct  subject 
responses  punched  zero/one  or  blank/one  to  1  and  2. 

A  selection  of  possible  output  has  been  made.   Note  that 
individual  preference  sums  and  scale  values ,  as  well  as  normal  deviates 
and  stimulus  preference  frequencies  are  printed.   The  latter  is  also 
stored.   This  storage  implies  some  further  use  is  made  of  the  frequencies, 
perhaps  in  the  missing  part  of  the  program. 


VIII.   References 


Torgerson,  Warren  S.   Theory  and  Methods  of  Scaling.   John  Wiley  and 
Sons,  New  York;  I960,  pp.  166-173. 

Edwards,  Allen  L.    Techniques  of  Attitude  Scale  Construction.   Appleton 
Century,  Crofts,  New  York:  1957,  pp.  19-52. 


SCALOGRAM  ANALYSIS 


General  Description 

The  SCALOGRAM  ANALYSIS  (mnemonic  :   SCA)  was  developed  to  provide  a 
method  of  producing  Guttman  scales  automatically  without  the  need  of 
external  decisions  to  determine  which  items  do  and  which  items  do  not  enter 
into  Guttman  scales.   Items  are  grouped  together  in  as  few  as  possible 
submatrices  with  each  subgroup  having  a  maximum  homogeneity  within  each 
submatrix.   Each  item  from  the  total  group  is  chosen  to  fit  into  only  one 
submatrix. 

The  SCALOGRAM  program  is  started  by  choosing  an  item  from  the  total 
group  and  then  it  searches  the  remainder  of  the  items  to  find  an  item 
similar  to  the  item  chosen.   Similarity  is  tested  by  using  an  error  cri- 
teria and  a  chi- square  test  to  insure  that  the  items  are  similar.   If  the 
above  criteria  are  met,  this  item  is  added  to  the  first  item  and  a  scale 
is  formed.   This  last  item  is  then  used  to  find  another  similar  item  and 
this  procedure  continues  until  either  of  the  two  criteria  is  not  met. 
Whenever  a  criteria  fails,  the  scale  is  terminated  and  a  new  scale  is 
started. 

SCALOGRAM  will  only  work  for  dichotomous  data  and  it  can  be  used 
to  analyze  both  subject-wise  and  item-wise.   SCALOGRAM  differs  from 
Guttman  analysis  in  three  ways:   l)   It  uses  an  empirical  rather  than  a 
rational  basis  for  selecting  items  to  enter  a  scale.   2)   It  uses  a 
statistical  method  of  deciding  on  groups  and  for  testing  the  scale- 
ability  of  the  item.   3)   It  yields  multiple  scales  rather  than  reject 
the  scale  hypothesis  for  the  whole  item  set. 

SCALOGRAM  can  be  considered  to  be  more  descriptive  than  the  raw  data 
but  less  than  factor  analysis.   SCALOGRAM  also  is  unlike  factor  analysis 
in  that  SCALOGRAM  is  not  bound  to  linear  assumptions  about  the  regressions 
involved.   Factor  analysis  is  set  up  to  study  quantitative  variables  and 
will  not  show  correct  relationships  between  qualitative  variables,  SCALOGRAM 
will  show  what  relationships  do  exist  between  qualitative  variables.   (See 
Guttman  1950  for  a  complete  discussion  of  the  relation  between  the  scalogram 
technique  and  other  statistical  procedures).   (See  Lingoes  196~3  f°r  "the 
complete  algorithm  for  SCALOGRAM) . 

References : 

Guttman,  L.  "Relation  of  Scalogram  Analysis  to  other  Techniques." 
In  Stouffer,  et  al.,  Measurement  and  Prediction.  Princeton,  N.J. : 
Princeton  University  Press,  1950  (P.  172-212). 

Lingoes,  J.  C.   "Multiple  Scalogram  Analysis.   A  Set-Theoretic  Model 
For  Analyzing  Dichotomous  Items."  Educational  and  Psychological 
Measurement  XXIII  (196"3),  501-524. 

Lingoes,  J.  C.   "A  Multiple  Scalogram  Analysis  of  Selected  Issues 
of  the  83rd  U.S.  Senate."  American  Psychologist,  XVII  (1962),  327- 
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II 


Parameters 


The  program  mnemonic  is  SCA.    The  following  parameters  appear  on 
the  program  card : 


Parameter 
1 
2 

3 


Use  or  Meaning 

Input  Address 

Address  of  Labels 

A  1  indicates  that  the  matrix  should  be 
transposed 


III. 


Since  the  program  scales  by  columns  or  items,  to  scale  by  subjects 
the  matrix  must  be  transposed. 

Labels  can  be  used  to  describe  items;  they  can  be  input  from  cards 
or  tape.   A  maximum  of  28  characters  is  allowed  per  label  and  they  should 
be  expressed  as  follows:   DATA(n)  (nA.U)  where  n<  7  •   A  separate  card  for 
each  label  is  most  convenient  to  use  with  the  description  in  the  first 
28  columns.   If  both  labels  and  data  are  being  input  from  cards,  labels 
must  precede  data. 

Restrictions 


Both  the  number  of  items  and  the  number  of  subjects  is  restricted 
to  U90.   Lables  are  restricted  to  28  characters. 

Data  must  be  coded  as  O's  and  l's.   If  data  is  not  of  this  form, 
TRANSFORMATIONS  may  be  used  to  recode  it. 

IV.   Example s 

SCA(C) (C)(1). 

ENDS 

DATA(7M7A^) 
labels 

END// 
DATA(U0)(i+0F1.0) 


data 


END/' 


Labels  and  data  are  on  cards,  28  columns  are  used  for  labels  and  scaling 
will  be  done  by  rows. 
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SCA(Sl) . 

ENDS 

DATA(30)(30F1.0) 

!   data 
END# 
Data  is  on  SEQUENTIAL  1  and  scaling  will  be  done  by  columns 


PROBIT  ANALYSIS  SECTION 


PRCBIT 


(Mnemonic:  PRB} 


General  Description 

This  program  calculates  maximum  likelihood  estimates  for  the  parameters 
A  and  B  in  the  probit  equation : 

Y  =  A  +  BX 

An  iterative  scheme  is  used. 


II.   Restrictions 


The  input  vectors  must  be  equal  length  k  and: 
input  vector  comes  from  a  separate  input  address. 


3  <  k  <  3000.  Each 


III.   Parameters 


Parameter 
Number 


Use  or  Meaning 


Input  vector  of  dosage  level. 
SEQUENTIAL  1-15- 


CARDS  or 


Input  vector  of  number  of  subjects  tested 

at  each  dose  level.   CARDS  or  SEQUENTIAL  1-15- 

Input  vector  containing  the  number  of 
subjects-at  each  level  responding  to  the 
drug.   CARDS  or  SEQUENTIAL  1-15- 

Output  vector  of  length  k  containing  the 
proportion  of  subjects  responding  to  the 
various  close  levels  of  the  drug.   SEQUENTIAL  1-15, 
and/or  PRINT. 

Output  vector  of  length  k  containing  the  values 
of  the  expected  probit  for  the  various  levels  of 
the  drug.   SEQUENTIAL  1-15  and/or  PRINT. 


Printed  output  consists  of: 


1  -  Estimate  of  intercept  constant  A 

2  -  Estimate  of  probit  regression  coefficient  B 

3  -  Chi-square  value  for  a  test  of  significance  of  final 

probit  equation 

X  -  f=1   N^id  -  Pi) 

where  R^  =  number  of  responses  (input  address  3) 

Ni  =  number  of  objects  tested  (input  address  2) 
Pj_  =  cumulative  normal  distribution  values  corresponding 
to  Zi  where  Z±   =  (A  +  BXi)  -  5 
where  A  and  B  are  from  final  probit  equation 


h   -   Degrees  of  freedom  for  X 
d.f.  =  k  -  2 

References : 

D.  J.  Finney,  Probit  Analysis,  Second  Edition,  (Cambridge  University  Press 
1952). 

The  program  was  adapted  from  the  IBM  Scientific  Subroutine  Package, 
360A-CM-03X,  Version  III,  page  kk. 

IV.   Example 

If  two  or  more  input  addresses  are  cards ,  the  cards  must  be  stacked 
in  order  of  their  parameter  numbers.   For  example: 

/*ID 

//   EXEC   SOUPAC 

//SOUPAC.SYSIN  DD   * 

MAT. 

MOVE ( CARDS )(SEQ2) 

END  P 

PRB ( CARDS ) ( SEQ2 ) ( CARDS ) ( PRINT ) . 

END  S 

DATA(1)( ) 

:      cards  for  SEQ  2 

END# 

DATA(l)( ) 

Cards  for  Parameter  1 

END# 

DATA(l)( ) 

Cards  for  Parameter  3 


END# 
/* 


NOTE   The  mnemonic  for  PROBIT  is  PRB,  nor  PRO. 


RANDOM  NUMBER  GENERATION  SECTION 


RANDOM  NUMBER  GENERATOR 


(Mnemonic:  RND) 


I.   General  Description 

This  program  calculates  a  matrix  of  normally  distributed  random 
numbers.   An  approximation  formula  is  used  to  normalize  uniformly 
distributed  random  numbers 

k 


Y  = 


i=l 


Xi   2 


s/k/12 


where  the  Xj_are  the  uniformly  distributed  random  numbers,  and  Y  is  the 
normally  distributed  number  with  mean  zero  (0)  and  standard  deviation 
one  (l) .  K  is  set  to  12  by  the  program.   Y  is  then  transformed  to  the 
input  scale  by  multiplying  by  the  standard  deviation  and  adding  the  mean. 

II.   Parameters 


Parameter 
Number 


k 

5 


Use  or  Meaning 

Input  Address  of  9  (nine)  digit  odd  integer 
used  as  a  starting  point  for  the  random 
number  generator.   CARDS  or  SEQUENTIAL  1-15. 

Output  Address  of  random  numbers  matrix. 
SEQUENTIAL  1-15.   PRINT  is  not  valid. 

Mean  of  random  numbers  enclosed  in  asterisks, 
i.e.,  *0.0*. 

Standard  deviation,  i.e.,  *1.0*. 

Number  of  rows  in  output  matrix  of  random 
numbers. 


6 

7 
III.   Special  Comments 


Number  of  columns  in  output  matrix  of  random 
numbers. 

Output  Address  of  9  digit  integer  which  is 
finishing  point  of  the  random  number  generator. 
Do  not  specify  PRINT  since  number  is  automatically 
printed. 


If  this  program  is  used  with  the  same  integer  starting  point,  it  will 
generate  the  same  numbers.   Thus,  use  Parameter  7  to  output  the  finishing 
location,  and  then  pass  that  address  as  the  starting  location  for  the  next 
use  of  this  program. 


V. 


Reference 


IBM  System/360  Scientific  Subroutine  Package  (36OA-CM-O3X)  Version  III 
page  77. 


UTILITY 


UTILITY  PROGRAM 


I.   General  Description 

The  UTILITY  program  has  been  designed  to  handle  small  utility  functions 
which  do  not  necessitate  or  justify  the  creation  of  a  unique  program  within 
the  SOUPAC  system.   The  following  statements  will  invoke  the  UTILITY  program. 

UTILITY. 

(insert  subparameter  card  or  cards  here 
END  P 
The  following  sections  describe  the  functions  of  the  various  subparameters , 


II.   PRESORT  Program 

The  PRESORT  Program  is  presently  the  only  program  in  the  UTILITY  program. 
It  is  used  to  set  up  the  data  cards  to  be  input  into  the  IBM  SORT/MERGE 
package  which  will  be  executed  following  the  present  SOUPAC  program  and 
before  another  SOUPAC  program  which  will  use  the  sorted  data  for  an  input. 


SORT  (0  or  l)(0  or  l)(V1), 


(vn), 


Parameter 
Number 


Use  or  Meaning 


0  if  data  is  to  be  sorted  in  ascending  order. 

1  if  data  is  to  be  sorted  in  descending  order. 

0  if  data  to  be  sorted  is  in  single  precision. 

1  if  data  to  be  sorted  is  in  double  precision. 


3  through  n  <_  20 


indicates  the  variable  or  variables  to  be  sorted 
with  the  later  variables,  if  any,  varying  most 
rapidly. 


Following  the  SOUPAC  program  in  which  the  UTILITY  program  appears , 
the  following  card  must  appear : 

//  EXEC  S0UPS0RT,INPUT=Snn,0UTPUT=  Smm. 
where  nn  and  mm  represent  the  two  digit  equivalent  of  the  sequential  unit 
numbers  to  be  input  to  the  sort  and  output  from  the  sort  to  the  next  SOUPAC 
program.   The  two  units  must  not  be  the  same. 

The  next  card  will  start  the  next  SOUPAC  program  which  will  operate 
under  the  assumption  that  the  sorted  data  has  been  supplied  on  the 
specified  sequential  unit  in  the  output  of  the  S0UPS0RT  program. 


//  EXEC  SOUPAC, DISP=0LD 
//SYSIN  DD  * 

(Your  program  which  uses  the  sorted  data) 
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The  example  given  below  is  for  sorting  cards  input  data  so  that  it 
may  be  input  into  a  FREQUENCY  program  which  uses  variable  5  as  a  control 
variable. 

/*ID  identification  card  information 

//   EXEC   S0UPAC 

//SYSIN  DD  * 

MATRIX. 

M0VE ( CARDS )( Si). 

END  PR0GRAM 

UTILITY. 

S0RT(O)(1)(5). 

END  PR0GRAM 

END  S0UPAC 

DATA(10)(10F5.0). 

(user's  data  deck) 

END  # 
I* 

II      EXEC   S0UPS0RT,INPUT=SO1,0UTPUT=SO2 

//   EXEC   S0UPAC ,DISP=0LD 

//SYSIN   DD   * 

FREQUENCY (S2). 

TW0. 

PER(1)(1)(1). 

C0NTR0L(5). 

END  PR0GRAM 

END  S0UPAC 

/* 

The  data  on  SI  is  sorted  in  ascending  order  on  variable  5-   The  data  is 
passed  to  the  SOUPSORT  job  step  on  SI  in  double  precision.   This  data  is 
sorted  on  variable  5  and  then  output  onto  S2  in  double  precision.   It  is 
then  input  into  the  FREQUENCY  program  of  the  next  SOUPAC  job  step,  whereupon 
analysis  continues. 

III.   Notes,  Restrictions,  and  Ideas 

1.  The  default  output  from  MATRIX  is  in  double  precision 

2.  The  output  from  TRANSFORMATIONS  is  in  single  precision 

3.  Only  one  utility  program  is  allowed  per  SOUPAC  program 

h.      If  other  sequential  units  have  been  used  during  the  first  SOUPAC 

program  besides  the  one  passed  to  the  sort  job  step,  they  are  still 
intact  and  usuable  in  the  second  SOUPAC  program  due  to  the  DISP=OLD 
parameters . 
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